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FOREWORD 


I n bygone centuries, our physical world appeared to be filled to the brim with mysteries. Divine powers 
could provide for genuine miracles; water and sunlight could turn arid land into fertile pastures, but the 
same powers could lead to miseries and disasters. The force of life, the vis vitalis, was assumed to be the 
special agent responsible for all living things. The heavens, whatever they were for, contained stars and other 
heavenly bodies that were the exclusive domain of the Gods. 

Mathematics did exist, of course. Indeed, there was one aspect of our physical world that was recognised to 
be controlled by precise, mathematical logic: the geometric structure of space, elaborated to become a genuine 
form of art by the ancient Greeks. From my perspective, the Greeks were the first practitioners of ‘mathematical 
physics’, when they discovered that all geometric features of space could be reduced to a small number of 
axioms. Today, these would be called ‘fundamental laws of physics’. The fact that the flow of time could be 
addressed with similar exactitude, and that it could be handled geometrically together with space, was only 
recognised much later. And, yes, there were a few crazy people who were interested in the magic of numbers, 
but the real world around us seemed to contain so much more that was way beyond our capacities of analysis. 

Gradually, all this changed. The Moon and the planets appeared to follow geometrical laws. Galilei and 
Newton managed to identify their logical rules of motion, and by noting that the concept of mass could be 
applied to things in the sky just like apples and cannon balls on Earth, they made the sky a little bit more 
accessible to us. Electricity, magnetism, light and sound were also found to behave in complete accordance 
with mathematical equations. 

Yet all of this was just a beginning. The real changés came with the twentieth century. A completely new 
way of thinking, by emphasizing mathematical, logical analysis rather than empirical evidence, was pioneered 
by Albert Einstein. Applying advanced mathematical concepts, only known to a few pure mathematicians, to 
notions as mundane as space and time, was new to the physicists of his time. Einstein himself had a hard 
time struggling through the logic of connections and curvatures, notions that were totally new to him, but are 
only too familiar to students of mathematical physics today. Indeed, there is no better testimony of Einstein's 
deep insights at that time, than the fact that we now teach these things regularly in our university classrooms. 

Special and general relativity are only small corners of the realm of modern physics that is presently being 
studied using advanced mathematical methods. We have notoriously complex subjects such as phase transitions in 
condensed matter physics, superconductivity, Bose-Einstein condensation, the quantum Hall effect, particularly 
the fractional quantum Hall effect, and numerous topics from elementary particle physics, ranging from fibre 
bundles and renormalization groups to supergravity, algebraic topology, superstring theory, Calabi-Yau spaces 
and what not, all of which require the utmost of our mental skills to comprehend them. 

The most bewildering observation that we make today is that it seems that our entire physical world 
appears to be controlled by mathematical equations, and these are not just sloppy and debatable models, but 
precisely documented properties of materials, of systems, and of phenomena in all echelons of our universe. 

Does this really apply to our entire world, or only to parts of it? Do features, notions, entities exist that are 
emphatically zot mathematical? What about intuition, or dreams, and what about consciousness? What 
about religion? Here, most of us would say, one should not even try to apply mathematical analysis, although 
even here, some brave social scientists are making attempts at coordinating rational approaches. 
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No, there are clear and important differences between the physical world and the mathematical world. 
Where the physical world stands out is the fact that it refers to ‘reality’, whatever ‘reality’ is. Mathematics is 
the world of pure logic and pure reasoning. In physics, it is the experimental evidence that ultimately decides 
whether a theory is acceptable or not. Also, the methodology in physics is different. 

A beautiful example is the serendipitous discovery of superconductivity. In 1911, the Dutch physicist Heike 
Kamerlingh Onnes was the first to achieve the liquefaction of helium, for which a temperature below 4.25 K 
had to be realized. Heike decided to measure the specific conductivity of mercury, a metal that is frozen solid 
at such low temperatures. But something appeared to go wrong during the measurements, since the volt 
meter did not show any voltage at all. All experienced physicists in the team assumed that they were dealing 
with a malfunction. It would not have been the first time for a short circuit to occur in the electrical 
equipment, but, this time, in spite of several efforts, they failed to locate it. One of the assistants was 
responsible for keeping the temperature of the sample well within that of liquid helium, a dull job, requiring 
nothing else than continuously watching some dials. During one of the many tests, however, he dozed off. 
The temperature rose, and suddenly the measurements showed the normal values again. It then occurred to 
the investigators that the effect and its temperature dependence were completely reproducible. Below 4.19 
degrees Kelvin the conductivity of mercury appeared to be strictly infinite. Above that temperature, it is 
finite, and the transition is a very sudden one. Superconductivity was discovered (D. van Delft, “Heike 
Kamerling Onnes”, Uitgeverij Bert Bakker, Amsterdam, 2005 (in Dutch)). 

This is not the way mathematical discoveries are made. Theorems are not produced by assistants falling 
asleep, even if examples do exist of incidents involving some miraculous fortune. 

The hybrid science of mathematical physics is a very curious one. Some of the topics in this Encyclopedia 
are undoubtedly physical. High T. superconductivity, breaking water waves, and magneto-hydrodynamics, 
are definitely topics of physics where experimental data are considered more decisive than any high-brow 
theory. Cohomology theory, Donaldson-Witten theory, and AdS/CFT correspondence, however, are examples 
of purely mathematical exercises, even if these subjects, like all of the others in this compilation, are strongly 
inspired by, and related to, questions posed in physics. | 

It is inevitable, in a compilation of a large number of short articles with many different authors, to see quite a 
bit of variation in style and level. In this Encyclopedia, theoretical physicists as well as mathematicians together 
made a huge effort to present in a concise and understandable manner their vision on numerous important 
issues in advanced mathematical physics. All include references for further reading. We hope and expect that 
these efforts will serve a good purpose. 


Gerard ’t Hooft, 
Spinoza Institute, 
Utrecht University, 
The Netherlands. 


PREFACE 


athematical Physics as a distinct discipline is relatively new. The International Association of 

Mathematical Physics was founded only in 1976. The interaction between physics and mathematics 
has, of course, existed since ancient times, but the recent decades, perhaps partly because we are living 
through them, appear to have witnessed tremendous progress, yielding new results and insights at a dizzying 
pace, so much so that an encyclopedia seems now needed to collate the gathered knowledge. 

Mathematical Physics brings together the two great disciplines of Mathematics and Physics to the benefit of 
both, the relationship between them being symbiotic. On the one hand, it uses mathematics as a tool to 
organize physical ideas of increasing precision and complexity, and on the other it draws on the questions 
that physicists pose as a source of inspiration to mathematicians. A classical example of this relationship 
exists in Einstein's theory of relativity, where differential geometry played an essential role in the formulation 
of the physical theory while the problems raised by the ensuing physics have in turn boosted the development 
of differential geometry. It is indeed a happy coincidence that we are writing now a preface to an 
encyclopedia of mathematical physics in the centenary of Einstein's annus mirabilis. 

The project of putting together an encyclopedia of mathematical physics looked, and still looks, to us a 
formidable enterprise. We would never have had the courage to undertake such a task if we did not believe, 
first, that it is worthwhile and of benefit to the community, and second, that we would get the much-needed 
support from our colleagues. And this support we did get, in the form of advice, encouragement, and 
practical help too, from members of our Editorial Advisory Board, from our authors, and from others as well, 
who have given unstintingly so much of their time to help us shape this Encyclopedia. 

Mathematical Physics being a relatively new subject, it is not yet clearly delineated and could mean 
different things to different people. In our choice of topics, we were guided in part by the programs of recent 
International Congresses on Mathematical Physics, but mainly by the advice from our Editorial Advisory 
Board and from our authors. The limitations of space and time, as well as our own limitations, necessitated 
the omission of certain topics, but we have tried to include all that we believe to be core subjects and to cover 
as much as possible the most active areas. 

Our subject being interdisciplinary, we think it appropriate that the Encyclopedia should have certain 
special features. Applications of the same mathematical theory, for instance, to different problems in physics 
will have different emphasis and treatment. By the same token, the same problem in physics can draw upon 
resources from different mathematical fields. This is why we divide the Encyclopedia into two broad sections: 
physics subjects and related mathematical subjects. Articles in either section are deliberately allowed a fair 
amount of overlap with one another and many articles will appear under more than one heading, but all are 
linked together by elaborate cross referencing. We think this gives a better picture of the subject as a whole 
and will serve better a community of researchers from widely scattered yet related fields. 

The Encyclopedia is intended primarily for experienced researchers but should be of use also to beginning 
graduate students. For the latter category of readers, we have included eight elementary introductory articles for easy 
reference, with those on mathematics aimed at physics graduates and those on physics aimed at mathematics 
graduates, so that these articles can serve as their first port of call to enable them to embark on any of the main 
articles without the need to consult other material beforehand. In fact, we think these articles may even form the 
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foundation of advanced undergraduate courses, as we know that some authors have already made such use of them. 

In addition to the printed version, an on-line version of the Encyclopedia is planned, which will allow both 
the contents and the articles themselves to be updated if and when the occasion arises. This is probably a 
necessary provision in such a rapidly advancing field. 

This project was some four years in the making. Our foremost thanks at its completion go to the members 
of our Editorial Advisory Board, who have advised, helped and encouraged us all along, and to all our 
authors who have so generously devoted so much of their time to writing these articles and given us much 
useful advice as well. We ourselves have learnt a lot from these colleagues, and made some wonderful 
contacts with some among them. Special thanks are due also to Arthur Greenspoon whose technical expertise 
was indispensable. 

The project was started with Academic Press, which was later taken over by Elsevier. We thank warmly 
members of their staff who have made this transition admirably seamless and gone on to assist us greatly in 
our task: both Carey Chapman and Anne Guillaume, who were in charge of the whole project and have been 
with us since the beginning, and Edward Taylor responsible for the copy-editing. And Martin Ruck, who 
manages to keep an overwhelming amount of details constantly at his fingertips, and who is never known to 
have lost a single email, deserves a very special mention. 

As a postscript, we would like to express our gratitude to the very large number of authors who generously 
agreed to donate their honorariums to support the Committee for Developing Countries of the European 
Mathematical Society in their work to help our less fortunate colleagues in the developing world. 


Jean-Pierre Francoise 
Gregory L. Naber 
Tsou Sheung Tsun 


GUIDE TO USE OF THE ENCYCLOPEDIA 


structure of the Encyclopedia 


The material in this Encyclopedia is organised into two sections. At the start of Volume 1 are eight Introductory Articles. 
The introductory articles on mathematics are aimed at physics graduates; those on physics are aimed at mathematics 
graduates. It is intended that these articles should serve as the first port of call for graduate students, to enable them to 
embark on any of the main entries without the need to consult other material beforehand. 

Following the Introductory Articles, the main body of the Encyclopedia is arranged as a series of entries in alphabetical 
order. These entries fill the remainder of Volume 1 and all of the subsequent volumes (2-5). 

To help you realize the full potential of the material in the Encyclopedia we have provided four features to help you find 
the topic of your choice: a contents list by subject, an alphabetical contents list, cross-references, and a full subject index. 


1. Contents List by Subject 


Your first point of reference will probably be the contents list by subject. This list appears at the front of each volume, 
and groups the entries under subject headings describing the broad themes of mathematical physics. This will enable the 
reader to make quick connections between entries and to locate the entry of interest. The contents list by subject is divided 
into two main sections: Physics Subjects and Related Mathematics Subjects. Under each main section heading, you will 
find several subject areas (such as GENERAL RELATIVITY in Physics Subjects or NONCOMMUTATIVE GEOMETRY 
in Related Mathematics Subjects). Under each subject area is a list of those entries that cover aspects of that subject, 
together with the volume and page numbers on which these entries may be found. 

Because mathematical physics is so highly interconnected, individual entries may appear under more than one subject 
area. For example, the entry GAUGE THEORY: MATHEMATICAL APPLICATIONS is listed under the Physics Subject 
GAUGE THEORY as well as in a broad range of Related Mathematics Subjects. 


2. Alphabetical Contents List 


The alphabetical contents list, which also appears at the front of each volume, lists the entries in the order in which they 
appear in the Encyclopedia. This list provides both the volume number and the page number of the entry. 

You will find “dummy entries” where obvious synonyms exist for entries or where we have grouped together related 
topics. Dummy entries appear in both the contents list and the body of the text. 


Example 
If you were attempting to locate material on path integral methods via the alphabetical contents list: 


PATH INTEGRAL METHODS see Functional Integration in Quantum Physics; Feynman Path Integrals 


The dummy entry directs you to two other entries in which path integral methods are covered. At the appropriate 
locations in the contents list, the volume and page numbers for these entries are given. 

If you were trying to locate the material by browsing through the text and you had looked up Path Integral Methods, 
then the following information would be provided in the dummy entry: 


Path Integral Methods see Functional Integration in Quantum Physics; Feynman Path Integrals 
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3. Cross-References 


All of the articles in the Encyclopedia have been extensively cross-referenced. The cross-references, which appear at the 
end of an entry, serve three different functions: 


i. To indicate if a topic is discussed in greater detail elsewhere. 
ii. To draw the reader's attention to parallel discussions in other entries. 


iii. To indicate material that broadens the discussion. 


Example 
The following list of cross-references appears at the end of the entry STOCHASTIC HYDRODYNAMICS 


See also: Cauchy Problem for Burgers-Type Equations; Hamiltonian 
Fluid Dynamics; Incompressible Euler Equations: Mathematical Theory; 
Malliavin Calculus; Non-Newtonian Fluids; Partial Differential Equations: 
Some Examples; Stochastic Differential Equations; Turbulence Theories; 
Viscous Incompressible Fluids: Mathematical Theory; Vortex Dynamics 


Here you will find examples of all three functions of the cross-reference list: a topic discussed in greater detail elsewhere 
(e.g. Incompressible Euler Equations: Mathematical Theory), parallel discussion in other entries (e.g. Stochastic Differ- 
ential Equations) and reference to entries that broaden the discussion (e.g. Turbulence Theories). 

The eight Introductory Articles are not cross-referenced from any of the main entries, as it is expected that introductory 
articles will be of general interest. As mentioned above, the Introductory Articles may be found at the start of Volume 1. 


4. Index 


The index will provide you with the volume and page number where the material is located. The index entries 
differentiate between material that is a whole entry, is part of an entry, or is data presented in a figure or table. Detailed 
notes are provided on the opening page of the index. 
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General Principles 


Classical mechanics is a theory of motions of point 
particles. If X = (x1,...,x,) are the particle positions 
in a Cartesian inertial system of coordinates, the 
equations of motion are determined by their masses 
(mi,...,m,), m; > 0, and by the potential energy of 
interaction, V(x1,...,x,), as 


here x;—(xj,...,x;j) are coordinates of the ith 
particle and ôx, is the gradient (0,,,...,0,,,); d is the 
space dimension (i.e., d — 3, usually). The potential 
energy function will be supposed *smooth," that is, 
analytic except, possibly, when two positions coin- 
cide. The latter exception is necessary to include the 
important cases of gravitational attraction or, when 
dealing with electrically charged particles, of Cou- 
lomb interaction. A basic result is that if V is 
bounded below, eqn [1] admits, given initial data 
Xo9—X(0),Xg—X(0), a unique global solution 
t —^ X(t), t € (-oo,00); otherwise a solution can fail 
to be global if and only if, in a finite time, it reaches 
infinity or a singularity point (i.e., a configuration in 
which two or more particles occupy the same point: 
an event called a collision). 

In eqn [1], —O,, V(x1,...,x,) is the force acting on 
the points. More general forces are often admitted. 
For instance, velocity-dependent friction forces: they 
are not considered here because of their phenomeno- 
logical nature as models for microscopic phenomena 
which should also, in principle, be explained in 
terms of conservative forces (furthermore, even from 
a macroscopic viewpoint, they are rather incomplete 
models, as they should be considered together with 
the important heat generation phenomena that 
accompany them). Another interesting example of 


forces not corresponding to a potential are certain 
velocity-dependent forces like the Coriolis force 
(which, however, appears only in noninertial frames 
of reference) and the closely related Lorentz force 
(in electromagnetism): they could be easily accom- 
modated in the Hamiltonian formulation of 
mechanics; see Appendix 2. 

The action principle states that an equivalent 
formulation of the eqns [1] is that a motion 
t— Xo(t) satisfying [1] during a time interval 
[t1,¢2] and leading from X! — Xo(ti) to X? — Xo(t;), 
renders stationary the action 


t 
1 


A({X}) = / (So maki V(X(t))) dt [2 
is EU 


within the class Ms, 1, (X1, X?) of smooth (i.e., 
analytic) “motions” t— X(t) defined for t € [ti,t5] 
and leading from X! to X^. 

The function 


£(Y, X) — Da - V(X) K(Y) — V(X), 
CUPS 
Y = (Yis. ,Yn) 


is called the Lagrangian function and the action can 
be written as 


[ co. xt)y dt 


The quantity K(X(t)) is called kinetic energy and 
motions satisfying [1] conserve energy as time 
t varies, that is, 


K(X(t)) + V(X(t)) = E = const. [3] 


Hence the action principle can be intuitively thought 
of as saying that motions proceed by keeping 
constant the energy, sum of the kinetic and potential 
energies, while trying to share as evenly as possible 
their (average over time) contribution to the energy. 

In the special case in which V is translation invariant, 

: : def A 

motions conserve linear momentum Q= 57. m;x;; if V 
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is rotation invariant around the origin O, motions 
conserve angular momentum M= Y`; mj;x; ^ X;, where ^ 
denotes the vector product in Rf, that is, it is the tensor 
(a ^ b); — ajb; — bjaj,1,j — 1,...,d: if the dimension 
d —3 the a ^ b will be naturally regarded as a vector. 
More generally, to any continuous symmetry group of 
the Lagrangian correspond conserved quantities: this is 
formalized in the Noether theorem. 

It is convenient to think that the scalar product 
in R” is defined in terms of the ordinary scalar product 
in R^, a-b— Ts djb; by (v,w) — > ,mivi Wi: 
so that kinetic energy and line element ds can be 
written as 天 (X) 4(X,X) and d? = $57 ,midx?, 
respectively. Therefore, the metric generated by the 
latter scalar product can be called kinetic energy 
metric. 

The interest of the kinetic metric appears from the 
Maupertuis’ principle (equivalent to [1]): the princi- 
ple allows us to identify the trajectory traced in R^ 
by a motion that leads from X! to X^ moving with 
energy E. Parametrizing such trajectories as 
T — X(r) by a parameter 7 varying in [0, 1] so that 
the line element is ds? = (0,X, 0, X) dr*, the principle 
states that the trajectory of a motion with energy E 
which leads from X! to X^ makes stationary, among 
the analytic curves č € Mo,;(X', X^), the function 


L(é) = | JE — V(E(s)) ds 4 


so that the possible trajectories traced by the 
solutions of [1] in Re and with energy E can be 
identified with the geodesics of the metric 
dm? * (E —V(X)) - ds?. | 

For more details, the reader is referred to Landau 
and Lifshitz (1976) and Gallavotti (1983). 


Constraints 


Often particles are subject to constraints which force 
the motion to take place on a surface M C R'4. i.e, 
X(t) is forced to be a point on the manifold 
M. A typical example is provided by rigid systems 
in which motions are subject to forces which keep 
the mutual distances of the particles constant: 
|x; — xj| — pij, with py time-independent positive quan- 
tities. In essentially all cases, the forces that imply 
constraints, called constraint reactions, are velocity 
dependent and, therefore, are not in the class of 
conservative forces considered here, cf. [1]. Hence, 
from a fundamental viewpoint admitting only conser- 
vative forces, constrained systems should be regarded 
as idealizations of systems subject to conservative 
forces which approximately imply the constraints. 


In general, the /-dimensional manifold M will not 
admit a global system of coordinates: however, it 
will be possible to describe points in the vicinity 
of any XeM by using N=nd coordinates 
q= (q1, - -4t Quis ---qN) varying in an open ball 
Byo: X — X(qi, .-. qe de. - -> 4N}. 

The q-coordinates can be chosen well adapted to 
the surface M and to the kinetic metric, i.e., so that 
the points of M are identified by q;/,1 —:::— qn —0 
(which is the meaning of “adapted”); furthermore, 
infinitesimal displacements (0,...,0,de;/,1,..., den) 
out of a point X° € M are orthogonal to M (in the 
kinetic metric) and have a length independent of the 
position of X° on M (which is the meaning of “well 
adapted” to the kinetic metric). 

Motions constrained on M arise when the 
potential V has the form 


V(X) = Va(X) + AW(X) [5] 


where W is a smooth function which reaches its 
minimum value, say equal to 0, precisely on the 
manifold M while V, is another smooth potential. 
The factor 入 > 0 is a parameter called the rigidity of 
the constraint. 

A particularly interesting case arises when the level 
surfaces of W also have the geometric property of 
being “parallel” to the surface M: in the precise sense 
that the matrix OF g, W(X ), ¿j > £ is positive definite 
and X-independent, for all X € M, in a system of 
coordinates well adapted to the kinetic metric. 

A potential W with the latter properties can be 
called an approximately ideal constraint reaction. In 
fact, it can be proved that, given an initial datum 
X? € M with velocity X? tangent to M, i.e., given 
an initial datum whose coordinates in a local system 
of coordinates are (qo,0) and (go,0) with qo= 
(401; rw. Jor) and do = CUE et Joe)s the motion 
generated by [1] with V given by [5] is a motion 
t — X(t) which 


1. as 入 一 oo tends to a motion t ^ X,,(t); 

2. as long as X. (1) stays in the vicinity of the initial 
data, say for 0<t<t, so that it can be 
described in the above local adapted coordinates, 
its coordinates have the form t— (q(t),0) — 
(qi(t),...,9e(t),0,...,0): that is, it is a motion 
developing on the constraint surface M; and 

3. the curve t — X4(t), t € [0,74], as an element of 
the space Mo, +, (X9. X. 5(t4)) of analytic curves on 
M connecting X? to X (t1), renders the action 


A(X) = 


stationary. 


The latter property can be formulated “intrinsically,” 
that is, referring only to M as a surface, via the 
restriction of the metric ds? to line elements ds— 
(dg1,...,dq,,0,...,0) tangent to M at the point 
X —(49,0,...,0) € M; we write ds? — >>)” gj(q)x 
dq; dq;. The £ x £ symmetric positive-definite matrix g 
can be called the metric on M induced by the kinetic 
energy. Then the action in [6] can be written as 


ty 1 1,2 
A(q) = J X 
0 ij 


= Vala) dt 7] 


where V ,(q) s V,(X(qi,---5 q,0, .. . ,0)): the function 


| 
def 1 «C Uv 
£(n.q) = 59 galanin — Vala) 
Lj 


= Zelan 3 — Va(q) i8] 

is called the constrained Lagrangian of the system. 

An important property is that the constrained motions 
conserve the energy defined as E= 1 (g(q)d, 9) 十 
V.(q); see next section. 

The constrained motion X,,(t) of energy E satisfies 
the Maupertuis’ principle in the sense that the curve 
on M on which the motion develops renders 


L(é) = | VE — Va(E(s)) ds 9 


stationary among the (smooth) curves that develop 
on M connecting two fixed values X, and X». In the 
particular case in which =n this is again Mauper- 
tuis’ principle for unconstrained motions under the 
potential V(X). In general, / is called the number of 
degrees of freedom because a complete description 
of the initial data requires 2¢ coordinates q(0), q(0). 

If W is minimal on M but the condition on W of 
having level surfaces parallel to M is not satisfied, i.e., 
if W is not an approximate ideal constraint reaction, 
it still remains true that the limit motion X» (t) takes 
place on M. However, in general, it will not satisfy the 
above variational principles. For this reason, motions 
arising as limits (as A— oc) of motions developing 
under the potential [5] with W having minimum on M 
and level curves parallel (in the above sense) to M are 
called ideally constrained motions or motions subject 
by ideal constraints to the surface M. 

As an example, suppose that W has the form 
W(X) =}; jep Willi — x|) with wi(|6|) > 0 an ana- 
lytic function vanishing only when |Ẹ| = pj; for i, j in 
some set of pairs P and for some given distances p; (e.g., 
Wye 三 (E 一 nae Aa > 0). Then W can be shown to 
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satisfy the mentioned conditions and therefore, the so 
constrained motions X(t) of the body satisfy the 
variational principles mentioned in connection with [7] 
and [9]: in other words, the above natural way of 
realizing a rather general rigidity constraint is ideal. 

The modern viewpoint on the physical meaning of 
the constraint reactions is as follows: looking at 
motions in an inertial Cartesian system, it will appear 
that the system is subject to the applied forces with 
potential V,(X) and to constraint forces which are 
defined as the differences R; — mix; + ox V, (X). The 
latter reflect the action of the forces with potential 
AW(X) in the limit of infinite rigidity (A — oc). 

In applications, sometimes the action of a constraint 
can be regarded as ideal: the motion will then verify the 
variational principles mentioned and R can be com- 
puted as the differences between the 77;X; and the active 
forces —O,, V, (X). In dynamics problems it is, however, 
a very difficult and important matter, particularly in 
engineering, to judge whether a system of particles can 
be considered as subject to ideal constraints: this leads 
to important decisions in the construction of machines. 
It simplifies the calculations of the reactions and fatigue 
of the materials but a misjudgment can have serious 
consequences about stability and safety. For statics 
problems, the difficulty is of lower order: usually 
assuming that the constraint reaction is ideal leads to 
an overestimate of the requirements for stability of 
equilibria. Hence, employing the action principle to 
statics problems, where it constitutes the principle of 
virtual work, generally leads to economic problems 
rather than to safety issues. Its discovery even predates 
Newtonian mechanics. 

We refer the reader to Arnol'd (1989) and 
Gallavotti (1983) for more details. 


Lagrange and Hamilton Forms 
of the Equations of Motion 


The stationarity condition for the action .A(q), cf. 
[7], [8], 1s formulated in terms of the Lagrangian 


C(, &), see [8], by 


d 
F Ow£ (alt), a(t) 


= ð; £((t).q(t)), 


which is a second-order differential equation called 
the Lagrangian equation of motion. It can be cast in 
"normal form": for this purpose, adopting the 
convention of “summation over repeated indices," 
introduce the “generalized momenta” 


eh (0 


def 


Pi =8(9) 4, i-—l...,t [11] 


4 Introductory Article: Classical Mechanics 


Since g(q) > 0, the motions ? — q(t) and the corre- 
sponding velocities t+ q(t) can be described equiva- 
lently by t — (q(t), p(t)): and the equations of motion 
[10] become the first-order equations 


Gi —-OyH(p.q, pi=—Oq,H(p,q) [12] 


where the function H, called the Hamiltonian of the 
system, is defined by 


(p.q) = g(a) p.p) Vala) [13] 
Equations [12], regarded as equations of motion for 
phase space points (p,q), are called Hamilton 
equations. [n general, q are local coordinates on M 
and motions are specified by giving q,4 or p,q. 

Looking for a coordinate-free representation of 
motions consider the pairs X, Y with X € M and Y a 
vector Y € Tx tangent to M at the point X. The 
collection of pairs (Y, X) is denoted T(M) — Uxem 
(Tx x (X]) and a motion £ — (X(t), X(t)) € T(M) in 
local coordinates is represented by (q(t), q(t)). The 
space T(M) can be called the space of initial data for 
Lagrange's equations of motion: it has 24 dimen- 
sions (also known as the *tangent bundle" of M). 

Likewise, the space of initial data for the 
Hamilton equations will be denoted T*(M) and it 
consists of pairs X,P with X € M and P—g(X)Y 
with Y a vector tangent to M at X. The space T*(M) 
is called the phase space of the system: it has 
2/ dimensions (and it is occasionally called the 
“cotangent bundle” of M). 

Immediate consequence of [12] is 


d 
5 Mle). a(t) =0 


and it means that H(p(t),q(t)) is constant along 
the solutions of [12]. Noting that H(p,q)= 
(1/2)(g(q) d, q) -- Valq) is the sum of the kinetic 
and potential energies, it follows that the conservation 
of H along solutions means energy conservation in 
presence of ideal constraints. 

Let S; be the flow generated on the phase space 
variables (p,q) by the solutions of the equations of 
motion [12], that is, let t—S,(p,q) = (p(t), q(t)) 
denote a solution of [12] with initial data (p,q). 
Then a (measurable) set A in phase space evolves in 
time £ into a new set S,A with the same volume: this 
is obvious because the Hamilton equations [12] have 
manifestly zero divergence (“Liouville’s theorem"). 

The Hamilton equations also satisfy a variational 
principle, called the Hamilton action principle: that 
is, if Mz, s (1.4). (P2542); M) denotes the space of 
the analytic functions g:t — (a(t), «(t)) which in the 
time interval [5,75] lead from (p,,q,) to (p5,q>), 
then the condition that gj(t) — (p(t), q(t)) satisfies 


[12] can be equivalently formulated by requiring 
that the function 


to 
An()* |. (rt) EA- HENO) d [14 
l 

be stationary for Ø = @ọù: in fact, eqns [12] are the 
stationarity conditions for the Hamilton action 
[14] on My, +, ((P1:41), (P2592); M). And, since the 
derivatives of w(t) do not appear in [14], statio- 
narity is even achieved in the larger space 
M. 5n(qd1,qd5; M) of the motions 9:t— (x(t), k(t)) 
leading from q, to q, without any restriction on 
the initial and final momenta p,,p; (which, there- 
fore, cannot be prescribed a priori independently 
of q,,45). If the prescribed data p,,q,,p,,q, are 
not compatible with the equations of motion (e.g., 
H(p,,42) X H(p2,q>)), then the action functional 
has no stationary trajectory in M,, s ((p1. 1), 
(P545); M). 

For more details, the reader is referred to Landau 
and Lifshitz (1976), Arnol'd (1989), and Gallavotti 
(1983). 


Canonical Transformations of Phase 
Space Coordinates 


The Hamiltonian form, [13], of the equations of 
motion turns out to be quite useful in several 
problems. It is, therefore, important to remark that 
it is invariant under a special class of transformations 
of coordinates, called canonical transformations. 

Consider a local change of coordinates on phase 
space, i.e., a smooth, smoothly invertible map 
C(a,«K)=(z',«’) between an open set U in the 
phase space of a Hamiltonian system with 
( degrees of freedom, into an open set U' in a 
2/-dimensional space. The change of coordinates is 
said to be canonical if for any solution 
t—(a(t),«(t)) of equations like [12], for any 
Hamiltonian H(m,x) defined on U, the C-image 
t — (xz (t),k' (t) =C(z(t),K(t)) is a solution of [12] 
with the “same” Hamiltonian, that is, with 
Hamiltonian 7t'(z', x") 2 H(C1 (m, x’). 

The condition that a transformation of coordi- 
nates is canonical is obtained by using the 
arbitrariness of the function H and is simply 
expressed as a necessary and sufficient property of 
the Jacobian L, 


(c p) 
L= 

C D 
Aij =m Brins, 
C — Ox, Kj; 


[15] 


© / 
-— e. 
Dg 一 Ons, Ki; 


where i,j/=1,...,@ Let 


(28) 


denote the 2/ x 2/ matrix formed by four £x Z 
blocks, equal to the 0 matrix or, as indicated, to the 
+ (identity matrix); then, if a superscript T denotes 
matrix transposition, the condition that the map be 
canonical is that 

L! = ELTE" or L = e a) [16] 
which immediately implies that det L= +1. In fact, 
it is possible to show that [16] implies det L=1. 
Equation [16] is equivalent to the four relations AD! — 
B! =1, —AB' +BA'=0,CD'—DCc'=0, and 
—CB! + DA! — 1. More explicitly, since the first and 
the fourth relations coincide, these can be expressed as 


Uns) = by, {TaT} = 0, {KK} =O [17] 


where, for any two functions F(z,x), G(z,x), the 
Poisson bracket is 


人 
(F, Gy Gr, k) € V (On F(A, )8., G(T, K) 
k=1 


— OQ, F(z, K) Or, G(z, K)) [18| 


The latter satisfies Jacob?'s identity: {{F, G}, O} + 
((G, O}, F} + {{Q, F}, G] — 0, for any three functions 
F,G,Q on the phase space. It is quite useful to 
remark that if t— (p(t), q(t)) =S;(p,q) is a solution 
to Hamilton equations with Hamiltonian H then, 
given any observable F(p,q), it “evolves” as 
F(t) F(p(t), q(t)) satisfying 


O,F(p(t), q(t)) — Ut, F\(p(e), q(t)) 


Requiring the latter identity to hold for all observables 
F is equivalent to requiring that the t — (p(t), q(t)) bea 
solution of Hamilton's equations for H. 

Let C: U — U' be a smooth, smoothly invertible 
transformation between two open 2/-dimensional 
sets: C(z, K) = (z', x’). Suppose that there is a function 
®(z',«) defined on a suitable domain W such that 


mT = QD KE) 


x —ü,e(m,k) P 


CKE = (PE) = { 


then C is canonical. This is because [19] implies that 
if K,z' are varied and if z,K',z',x are related by 


C(zm,k)—(zm,k), then z:de--x':dz —dé(z.,x), 
which implies that 
万 .dx —'H(z,x)dt = z' .dx — (C^! (m kdt 

+ dé(z,x)-—d(z - x’) [20] 
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It means that the Hamiltonians H(p,q) and 
H (p, q) € (C^ (f, d)) have Hamilton actions 
Ay and Aw differing by a constant, if evaluated 
on corresponding motions  (p(t),q(t) and 
(p'(t), q'(£)) =C(p(t), q(t)). 

The constant depends only on the initial and final 

values (p(t), q(t1)) and (p(t2), q(t2)) and, respec- 
tively, (p'(ti),q/(t1)) and (p'(t2),q'(t2)) so that if 
(p(t), q(t)) makes Ay extreme, then (p'(t), q'(t)) = 
C(p(t), q(t)) also makes A, extreme. 
- Hence, if t — (p(t), q(t)) solves the Hamilton equa- 
tions with Hamiltonian H(p,q) then the motion 
t — (p'(t),q'(t)) =C(p(t), q(t)) solves the Hamilton 
equations with Hamiltonian ?4'(p^, q') = H(C | (p! q')) 
no matter which it is: therefore, the transformation is 
canonical. The function ® is called its generating 
function. 

Equation [19] provides a way to construct 
canonical maps. Suppose that a function (z', x) is 
given and defined on some domain W; then setting 


cz = Q(T, K) 
kK’ = Op o(z ,k) 


and inverting the first equation in the form 

— &(z,K) and substituting the value for z' thus 
obtained, in the second equation, a map 
C(z, K) =(m',«') is defined on some domain (where 
the mentioned operations can be performed) and if 
such domain is open and not empty then C is a 
canonical map. 

For similar reasons, if l'(K,X') is a function 
defined on some, domain then setting z — oT 
(x, kK’), zt' = —OpT(k,k') and solving the first rela- 
tion to express K' = A(z, Kk) and substituting in the 
second relation a map (z', k') ^ C(z, &) is defined on 
some domain (where the mentioned operations can 
be performed) and if such domain is open and not 
empty then C is a canonical map. 

Likewise, canonical transformations can be con- 
structed starting from a priori given functions 
F(z, «’) or G(z,z'). And the most general canonical 
map can be generated locally (i.e., near a given point 
in phase space) by a single one of the above four 
ways, possibly composed with a few "trivial" 
canonical maps in which one pair of coordinates 
(Tti, Ki) is transformed into (—K;,7;). The necessity of 
also including the trivial maps can be traced to the 
existence of homogeneous canonical maps, that is, 
maps such that æ -dx= -dgK' (e.g., the identity 
map, see below or [49] for nontrivial examples) 
which are action preserving hence canonical, but 
which evidently cannot be generated by a function 
$(x,X' although they can be generated by a 
function depending on 7’, k. 
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Simple examples of homogeneous canonical maps 
are maps in which the coordinates q are changed 
into d'—R(q) and, correspondingly, the p's are 
transformed as p’=(0,R(q)) ^ Tp, linearly: indeed, 
this map is generated by the function F(p’, q) 
p: R(q). 

For instance, consider the map “Cartesian—polar” 
coordinates (41,492) —9 (p,0) with (p,0) the polar 


coordinates of q [namely p= 
(q2/q1)) and let n= 4| Igi — (m.m) and t=(—m, n). 
Setting p, S p.n n, py pp- t, the map (pı, p2, 
q1, q2) — (Pp Pos p80) is homogeneous canonical 
(because p-dq=p-ndp+p-tpd0=p,dp+ psd6). 

As a further example, any area-preserving map 
(p,q) —» (p',q') defined on an open region of the 
plane R^ is canonical: because in this case the 
matrices A, B, C, D are just numbers, which satisfy 
AD — BC- 1 and, therefore, [16] holds. 

For more details, the reader is referred to Landau 
and Lifshitz (1976) and Gallavotti (1983). 


qi + q3,9= arctan 


Quadratures 


The simplest mechanical systems are integrable by 

quadratures. For instance, the Hamiltonian on R°, 
1 2 

H(p.q) =z + Vla) [21] 

generates a motion ¢— q(t) with initial data qo, qo 


such that H(po,qo)=E, i.e, imqg-4 V(qo)— 
satisfying 


2 
q(t) = &/—(E — V(a(t)) 


If the equation E= V(q) has only two solutions 
q. (E) < q,(E) and |0,V(qi(E))| > 0, the motion is 
periodic with period 


Se JN (E) NEVO — V(x)) 22) 


The special solution with initial data qo= 
q.(E), qo =0 will be denoted O(t), and it is an 
analytic function (by the general regularity theorem 
on ordinary differential equations). For 0 € ¢ < T/2 
or for T/2 € t € T it is given, respectively, by 


Q(t) d 
UN eae P a ci 23 
i Ls (2/m)(E — V(x)) = 
or 
T Q(t) dx 
CC 23b 
"2 Ls (2/m)(E — V(x)) Ex 


The most general solution with energy E has the 
form q(t)— O(to-- t), where to is defined by 
qo = O(to), qo = O(to), i.e., it is the time needed for 
the *standard solution" O(t) to reach the initial data 
for the new motion. 

If the derivative of V vanishes in one of the 
extremes or if at least one of the two solutions gi (E) 
does not exist, the motion is not periodic and it may 
be unbounded: nevertheless, it is still expressible via 
integrals of the type [22]. If the potential V is 
periodic in q and the variable q is considered to be 
varying on a circle then essentially all solutions are 
periodic: exceptions can occur if the energy E has a 
value such that V(q) — E admits a solution where V 
has zero derivative. l 

Typical examples are the harmonic oscillator, the 
pendulum, and the Kepler oscillator: whose Hamil- 
tonians, if m, w, g, b, G, k are positive constants, are, 
respectively, 


HB wea? 
2m + 村 和 q 
2 
E + mg(1 — cos 7) [24] 
本 
2m lg — 24 


the Kepler oscillator Hamiltonian has a potential 
which is singular at g=0 but if G #0 the energy 
conservation forbids too close an approach to 4 — 0 
and the singularity becomes irrelevant. 

The integral in [23] is called a quadrature and the 
systems in [21] are therefore integrable by quad- 
ratures. Such systems, at least when the motion is 
periodic, are best described in new coordinates in 
which periodicity is more manifest. Namely when 
V(q) = E has only two roots q+ (E) and TV'(q..(E)) > 0 
the energy-time coordinates can be used by replac- 
ing q,q or p,q by E, 7, where 7 is the time needed 
for the standard solution ? — Q(t) to reach the given 
data, that is, O(rT)—4, Q(7) 2 d. In such coordi- 
nates, the motion is simply (E,7) — (E, 7 +t) and, 
of course, the variable 7 has to be regarded as 
varying on a circle of radius T/27. The E,r 
variables are a kind of polar coordinates, as can 
be checked by drawing the curves of constant E, 
“energy levels," in the plane p,q in the cases in 
[24]; see Figure 1. 

In the harmonic oscillator case, all trajectories are 
periodic. In the pendulum case, all motions are 
periodic except the ones which separate the oscilla- 
tory motions (the closed curves in the second 
drawing) from the rotatory motions (the apparently 
open curves) which, in fact, are on closed curves as 
well if the q coordinate, that is, the vertical 


Figure 1 The energy levels of the harmonic oscillator, the 
pendulum, and the Kepler motion. 


coordinate in Figure 1, is regarded as "periodic" 
with period 27h. In the Kepler case, only the 
negative-energy trajectories are periodic and a few 
of them are drawn in Figure 1. The single dots 
represent the equilibrium points in phase space. 

The region of phase space where motions are 
periodic is a set of points (p,q) with the 
topological structure of Uyev({u} x Cu), where u is 
a coordinate varying in an open interval U (e.g., 
the set of values of the energy), and C, is a closed 
curve whose points (p,q) are identified by a 
coordinate (e.g., by the time necessary for an 
arbitrarily fixed datum with the same energy to 
evolve into (p, q)). 

In the above cases, [24], if the “radial” coordinate 
is chosen to be the energy the set U is the interval 
(0,--oo) for the harmonic oscillator, (0,2mg) or 
(2mg, +00) for the pendulum, and (一 3 mk? | G?, 0) in 
the Kepler case. The fixed datum for the reference 
motion can be taken, in all cases, to be of the form 
(0, qo) with the time coordinate to given by [23]. 

It is remarkable that the energy-time coordinates 
are canonical coordinates: for instance, in the vicinity 
of (po, qo) and if po > 0, this can be seen by setting 


S(q, E) "dam m(E — V(x))dx [25] 


qo 


and checking that p —à,s(q, E) t—OpgS(q,E) are 
identities if (p,q) and (E,t) are coordinates for the 
same point so that the criterion expressed by [20] 
applies. 

It is convenient to standardize the coordinates 
by replacing the time variable by an angle a= 
(27/T(E))t; and instead of the energy any invertible 
function of it can be used. 

It is natural to look for a coordinate A — A(E) 
such that the map (p,4) —^(A,o) is a canonical 
map: this is easily done as the function 


"VAm(EA)- Vi)dx [26] 


qo 
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generates (locally) the correspondence between 


= Vam(E(A] — Vig) and 
a = EA) f ms 
o \/2m-1(E(A) = V(x)) 
Therefore, by the criterion [20], if 
2r 
~ T(E(A)) 


i.e., if A(E) = T(E)/2z, the coordinates (A,o) will 
be canonical coordinates. Hence, by [22], A(E) can 
be taken as 


E'(A) 


1, q+(E) 
A= v 2m(E — V(q))dq 
Es" 7 
EV 27 


where the last integral is extended to the closed curve 
of energy E; see Figure 1. The action-angle coordi- 
nates (A,a) are defined in open regions of phase 
space covered by periodic motions: in action-angle 
coordinates such regions have the form W =J x T of 
a product of an open interval / and a one- 
dimensional “torus” T — [0,27] (i.e., a unit circle). 
For details, the reader is again referred to Landau and 
Lifshitz (1976), Arnol'd (1989), and Gallavotti (1983). 


Quasiperiodicity and Integrability 


A Hamiltonian is called integrable in an open region 
W C T*(M) of phase space if 


1. there is an analytic and nonsingular (ie., with 
nonzero Jacobian) change of coordinates (p, q) — 
(I, pọ) mapping W into a set of the form Z x T! 
with Z c R' (open); and furthermore 

2. the flow t— S,(p,q) on phase space is trans- 
formed into (I, 9) — (I, 9 + o(1)t) where @(I) is a 
smooth function on Z. 


This means that, in suitable coordinates, which 
can be called “integrating coordinates," the system 
appears as a set of / points with coordinates 
Q — (q1,..., £i) moving on a unit circle at angular 
velocities eI) = (w1(I),..., w;(I)) depending on the 
actions of the initial data. 

A system integrable in a region W which, in 
integrating coordinates I, 9, has the form Z x T“ is 
said to be anisochronous if det O;@(I) Æ 0. It is said 
to be isochronous if &(I) =@ is independent of I. 
The motions of integrable systems are called 
quasiperiodic with frequency spectrum @(I), or 
with frequencies @(I)/27, in the coordinates (I, 9). 

Clearly, an integrable system admits £ independent 
constants of motion, the I= (I1,...,I;), and, for each 
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choice of I, the other coordinates vary on a “standard” 
(-dimensional torus T ^: hence, it is possible to say that 
a phase space region of integrability is foliated into 
(-dimensional invariant tori 7 (I) parametrized by the 
values of the constants of motion I € Z. 

If an integrable system is anisochronous then it is 
canonically integrable: that is, it is possible to define 
on W a canonical change of coordinates (p,q) — 
C(A,a) mapping W onto J x T^ and such that 
H(C (A, ,a)) = b(A) for a suitable b. Then, if 

aA) € ! OAb(A), the equations of motion become 


À A0, 


Given a system (1, Q9) of coordinates integrating an 
anisochronous system the construction of action- 
angle coordinates can be performed, in principle, via 


& = @(A) [28] 


a classical procedure (under a few extra 
assumptions). 

Let ~Y1,...,Y be £ topologically independent circles 
on T‘, for definiteness let 4;(I) = (9 |ui — =.= 
Yi-1 =Pi1 =" =0, p; € [0, 27]}, and set 

A) —5- d. pda [29] 
: 2n vid) 


If the map I — A(I) is analytically invertible as 
I — I(A), the function 


p 
S(A,9) = (A) | p- dq [30] 


is well defined if the integral is over any path A 
joining the points  (p(I(A), 0), q(1(A),0) and 
(p(I(A), 9)), q(I(A), 9) and lying on the torus para- 
metrized by I(A). 

The key remark in the proof that [30] really 
defines a function of the only variables A, ø is that 
anisochrony implies the vanishing of the Poisson 
brackets (cf. [18]): {l,j} 2 0 (hence also (A5, Aj} = 
? jg On Ai Or Aye» Ip} = 0). And the property 
(l; 1; - 0 can be checked to be precisely the 
integrability condition for the differential form p - dq 
restricted to the surface obtained by varying q while p is 
constrained so that (p,q) stays on the surface 
I — constant, i.e., on the invariant torus of the points 
with fixed I. 

The latter property is necessary and sufficient in 
order that the function S(A, 9) be well defined (i.e., 
be independent on the integration path A) up to an 
additive quantity of the form 9»7,2«;A; with 
N=(n1,..., ne) integers. 

Then the action-angle variables are defined by the 
canonical change of coordinates with S(A,@) as 
generating function, i.e., by setting 


aj = OA,S(A, 9), I; = 05, S(A, Q) [31] 


and, since the computation of S(A, o) is “reduced to 
integrations" which can be regarded as a natural 
extension of the quadratures discussed in the one- 
dimensional cases, such systems are also called 
integrable by quadratures. The just-described con- 
struction is a version of the more general Arnol’d- 
Liouville theorem. 

In practice, however, the actual evaluation of the 
integrals in [29], [30] can be difficult: its analysis in 
various cases (even as “elementary” as the pendu- 
lum) has in fact led to key progress in various 
domains, for example, in the theory of special 
functions and in group theory. 

In general, any surface on phase space on which 
the restriction of the differential form p dq is locally 
integrable is called a Lagrangian manifold: hence the 
invariant tori of an anisochronous integrable system 
are Lagrangian manifolds. 

If an integrable system is anisochronous, it cannot 
admit more than / independent constants of motion; 
furthermore, it does not admit invariant tori of 
dimension >f. Hence /-dimensional invariant tori 
are called maximal. 

Of course, invariant tori of dimension <£ can also 
exist: this happens when the variables I are such that 
the frequencies @(I) admit nontrivial rational rela- 


tions; ie. there is an integer components vector 
vEZ,v=(M,...,%) x 0 such that 


):v 2 u(I)v; 2 0 [32] 


in this case, the invariant torus 7(I) is called 
resonant. If the system is anisochronous then 
det 0;@(I) 4 0 and, therefore, the resonant tori are 
associated with values of the constants of motion 
I which form a set of measure zero in the space 
T but which is not empty and dense. 

Examples of isochronous systems are the systems of 
harmonic oscillators, i.e., systems with Hamiltonian 


where the matrix v is a positive-definite matrix. 
This is an isochronous system with frequencies 
@ = (w1, ..- we) whose squares are the eigenvalues of 

‘ - —1/2 uM ‘ ‘ 
the matrix m; ^ cim; ‘~. It is integrable in the region 
W of the data x= (p,q) € R” such that, setting 


2 
8.iPi Ug. idi 
"as ron) n) 


for all eigenvectors vg, 9 — 1,..., ¢, of the above 
matrix, the vectors A have all components >0. 


Even though this system is isochronous, it never- 
theless admits a system of canonical action-angle 
coordinates in which the Hamiltonian takes the 
simplest form 


with 


Qg = — arctan 


g 

2, V/miwgUg, iqi 
过 1 

as conjugate angles. 

An example of anisochronous system is the free 
rotators or free wheels: i.e., V noninteracting points 
on a circle of radius R or / noninteracting homo- 
geneous coaxial wheels of radius R. If J; — m;R? or, 
respectively, J;= (1/2)m;R? are the inertia moments 
and if the positions are determined by £ angles a = 
(01,...,0;), the angular velocities are constants 
related to the angular momenta A = (A1,..., Aj) by 
wi — A;/J;. The Hamiltonian and the spectrum are 


srra 


For further details see Landau and Lifshitz (1976), 
Gallavotti (1983), Arnol'd (1989), and Fassò (1998). 


Multidimensional Quadratures: 
Central Motion 


Several important mechanical systems with more 
than one degree of freedom are integrable by 
canonical quadratures in vast regions of phase 
space. This is checked by showing that there is a 
foliation into invariant tori 7 (I) of dimension equal 
to the number of degrees of freedom (/) parame- 
trized by / constants of motion I in involution, i.e., 
such that {1;, 1;} = 0. One then performs, if possible, 
the construction of the action-angle variables by 
the quadratures discussed in the previous section. 

The above procedure is well illustrated by the 
theory of the planar motion of a unit mass attracted 
by a coplanar center of force: the Lagrangian is, in 
polar coordinates (p, 6), 

£ — 7 (P + p) - V(o) 

The planarity of the motion is not a strong restriction 
as central motion always takes place on a plane. 
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Hence, the equations of motion are 


d 4 
0 — -— 
a; "P 0 


Le., mp =G is a constant of motion (it is the 
angular momentum), and 


= ƏV (P) + 07, p 8 


Then the energy conservation yields a second 
constant of motion E, 


Baile r 
1 1 p; 
am? + Im gat Vp p) [35] 


The right-hand side (rhs) is the Hamiltonian for the 
system, derived from £, if pp, pọ denote conjugate 
momenta of p, 0: p, — mp and ps — mp? 0 (note that 
po — G). 

Suppose p*V(p) 5,00: then the singularity at the 
origin cannot be. reached by any motion starting 
with p > 0 if G > 0. Assume also that the function 


2 
Velo) ez + VO) 


has only one minimum Eo(G), no maximum and no 
horizontal inflection, and tends to a limit E,,(G) < oc 
when p—oo. Then the system is integrable in the 
domain W = ((p,q) | Eo(G) < E < E4(G), G £ 0}. 
This is checked by introducing a *standard" periodic 
solution ? — R(t) of mp-— —O,Vc(p) with energy 
Fo(G) < E < E,(G) and initial data p= pr, (G), 
p=0 at time t=0, where pg.i(G) are the two 
solutions of Vg(p) = E, see the section *Quadratures": 
this is a periodic analytic function of t with period 


pE,« (G) dx 


TE 0) -2) o JONE V 


pE,-(G) 


The function R(t) is given, for 0 < t < + T(E, G) 
or for + T(E, G) € t € T(E, G), by the quadratures 


R(t) dx 
= TURIN ^ o PNE MEA 36 
ls (2/m)(E — Vc(x)) sia 
or 
T(E,G) fh? dx 
_ T(E,G) © M. nS 
: 2 f= (2/m)(E — Ve(x)) Pen 
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respectively. The analytic regularity of R(t) follows 
from the general existence, uniqueness, and regularity 
theorems applied to the differential equation for j. 

Given an initial datum jo, po, 40, 00 with energy E 
and angular momentum G, define to to be the time 
such that R(to) = po, R(to) = po: then p(t) = R(t + to) 
and @(t) can be computed as 


G 
o mR(t + to) 


a second quadrature. Therefore, we can use as 
coordinates for the motion E, G, to, which determine 
po, po, Ôo and a fourth coordinate that determines 60 
which could be bo itself but which is conveniently 
determined, via the second quadrature, as follows. 
The function Gm R(1) ? is periodic with period 
T(E, G); hence it can be expressed in a Fourier series 


27 
(E, G) 4- (E, G) exp tk ) 
(E G) 3 uE Gene (TE gy 


the quadrature for 0(t) can be performed by 
integrating the series terms. Setting 


zz, def T(E, G) xi (E, G) 2T 
Ato) = GE "2 ae exp T(E, G) c; itk 
and (0) — ĝo — 0(to), the expression 
t 
G 
0(t) — 0 «f — dr 
iiaii ETAT. 


becomes 
pı(t) = p1 (0) + xo(E, G) t [37] 
Hence the system is integrable and the spectrum is 
@(E, G) = (wo(E, G), wi (E, G)) = (wo, w1) with 
def 2r 
AT TEG 


while I —(E, G) are constants of motion and the 
angles Ø= (qo, 1) can be taken as 


and W1 n (E, G) 


def def = 
po = woto, £1 = bo — Oto) 


At E, G fixed, the motion takes place on a two- 
dimensional torus 7 (E, G) with yo, as angles. 

In the  anisochronous cases, ie, when 
det Og, ce(E, G) #0, canonical action-angle vari- 
ables conjugated to (pp, p, po, 9) can be constructed 
via [29], [30] by using two cycles 41,4? on the torus 
T(E, G). It is convenient to choose 


1. yı as the cycle consisting of the points p—x,0 — 0 
whose first half (where p, > 0) consists in the 
set pg, .(G) € x € pg, 4(G), p, = /2m(E — Vc(x)) 
and dô = 0; and 


2. y2 as the cycle p—const, 0 € [0,27] on which 
dp — 0 and p, — G obtaining 


2 pe,+(G) 


A; = 2x J, v 2m(E — Ve(x))dx, 


(G) [38] 
Ai = G 


According to the general theory (cf. the previous 
section) a generating function for the canonical 
change of coordinates from (pp, p, po, 9) to action- 
angle variables is (if, to fix ideas, p, > 0) 


S(A1, A2, p, 0) .— co+ f V2m(E - Va(x))dx [39] 
J PE, 


In terms of the above wo, xo the Jacobian matrix 
O(E, G)/0(A1, A2) is computed from [38], [39] to be 
t (3 . It follows that ðS — 1,055 —0 — (t) — xo! 
so that, see [31], 


ai 04,$ — wot, | o2 04,8 —0—988(t) [40] 


and (A1,01), (A2, 05) are the action-angle pairs. 
For more details, see Landau and Lifshitz (1976) 
and Gallavotti (1983). 


Newtonian Potential and Kepler's Laws 


The anisochrony property, that is, det O(wo, xo)/ 
O(A1, A2) ZO or, equivalently, det O(wo, xo)/ 
O(E, G) 0, is not satisfied in the important cases 
of the harmonic potential and the Newtonian 
potential. Anisochrony being only a sufficient con- 
dition for canonical integrability it is still possible 
(and true) that, nevertheless, in both cases the 
canonical transformation generated by [39] inte- 
grates the system. This is expected since the two 
potentials are limiting cases of anisochronous ones 
(e.g., |g| "^ and |g| ^ with e— 0). 
The Newtonian potential 


1 km 
^(p,q) = — p? —— 
(p.q) zF ia 


is integrable in the region G Z0, Eo(G)= 
—k?m? /2G? < E < 0, |G| < /k?n?/(-2E). Pro- 
ceeding as in the last section, one finds integrating 
coordinates and that the integrable motions develop 
on ellipses with one focus on the center of attraction 
S so that motions are periodic, hence not anisochro- 
nous: nevertheless, the construction of the canonical 
coordinates via [29]-[31] (hence [39]) works and 
leads to canonical coordinates (L', A, G',^/). To 
obtain action-angle variables with a simple 
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e=0.3 


Figure 2 Eccentric and true anomalies of P, which moves on a small circle E centered at a point c moving on the circle D located 
half-way between the two concentric circles containing the Keplerian ellipse: the anomaly of c with respect to the axis OS is £. The 
circle D is eccentric with respect to S and therefore £ is, even today, called eccentric anomaly, whereas the circle D is, in ancient 
terminology, the deferent circle (eccentric circles were introduced in astronomy by Ptolemy). The small circle E on which the point P 
moves is, in ancient terminology, an epicycle. The deferent and the epicyclical motions are synchronous (i.e., they have the same 
period); Kepler discovered that his key a priori hypothesis of inverse proportionality between angular velocity on the deferent and 
distance between P and S (i.e., p£ = constant) implied both synchrony and elliptical shape of the orbit, with focus in S. The latter law is 
equivalent to p20 = constant (because of the identity a£ = pĝ). Small eccentricity ellipses can hardly be distinguished from circles. 


interpretation, it is convenient to perform on the 
variables (L’, A, G',^/) (constructed by following the 
procedure just indicated) a further trivial canonical 
transformation by setting L=L'+G’,G=G’, 
入 三 入 , 了 了 三 了 — X; then 


1. A (average anomaly) is the time necessary for the 
point P to move from the pericenter to its actual 
position, in units of the period, times 27; 

2. L (action) is essentially the energy E = — k? m? /2L^; 

3. G (angular momentum); 

4. y (axis longitude), is the angle between a fixed 
axis and the major axis of the ellipse oriented 
from the center of the ellipse O to the center of 
attraction S. 


The eccentricity of the ellipse is e such that G= 
+LV1—e*. The ellipse equation is p=a(1— 
e cos €), where € is the eccentric anomaly (see 
Figure 2), a=L*/km* is the major semiaxis, and 
p is the distance to the center of attraction S. 

Finally, the relations between eccentric anomaly £, 
average anomaly A, true anomaly 0 (the latter is the 
polar angle), and SP distance p are given by the 
Kepler equations 


A = €-—esin€ 
(1 — ecos€)(1 + ecos@) = 1 — & 
0 / 
A= (1 -eyh | [41] 
| o (1+ecos6’) 
p l2 


a 1+ecos@ 


and the relation between true anomaly and average 
anomaly can be inverted in the form 


E= A+B) 
1 — e? [42] 


p 
= A 一 m —— 
[a o a 1+ecos(\+ fy) 


where  g4—g(esin A, ecos A), f, = f(esin A, ecos A), 
and g(x,y),f(x,y) are suitable functions analytic 
for |x|, |y| < 1. Furthermore, g(x, y) 2 x(1 4- y 4 ---), 
f(x, y) 22x(1--3y----.) and the ellipses denote 
terms of degree 2 or higher in x, y, containing only 
even powers of x. 

For more details, the reader is referred to Landau 
and Lifshitz (1976) and Gallavotti (1983). 


Rigid Body 


Another fundamental integrable system is the rigid 
body in the absence of gravity and with a fixed point 
O. It can be naturally described in terms of the Euler 
angles 09, Po, Vo (see Figure 3) and their derivatives 
0o, Qo, Yo. 

Let l, [2,13 be the three principal inertia moments 
of the body along the three principal axes with unit 
vectors 74,152,143. The inertia moments and the 
principal axes are the eigenvalues and the associated 
unit eigenvectors of the 3x 3 inertia matrix Z, 
which is defined by Zp, = 7 ,mi(x;)j(x;)), where 
b, k — 1,2,3 and x; is the position of the ith particle 
in a reference frame with origin at O and in which 


Figure 3 The Euler angles of the comoving frame i4, i2, ia with 
respect to a fixed frame x, y, z. The direction n is the "node line, 
intersection between the planes x, y and ii, jo. 
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all particles are at rest: this comoving frame exists as 
a consequence of the rigidity constraint. The 
principal axes form a coordinate system which is 
comoving as well: that is, in the frame (O; i,15, i5) 
as well, the particles are at rest. 

The Lagrangian is simply the kinetic energy: we 
imagine the rigidity constraint to be ideal (e.g., as 
realized by internal central forces in the limit of 
infinite rigidity, as mentioned in the section *Lagrange 
and Hamilton forms of equations of motion"). The 
angular velocity of the rigid motion is defined by 


a= ĝon + poz + bois [43] 


expressing that a generic infinitesimal motion 
must consist of a variation of the three Euler 
angles and, therefore, it has to be a rotation of 
speeds ĝo, ġo, Wo around the axes n,z,i3 as shown 
in Figure 3. 

Let (w1,wW2,w3) be the components of @ along the 
principal axes 11,12,13: for brevity, the latter axes 
will often be called 1,2,3. Then the angular 
momentum M, with respect to the pivot point O, 
and the kinetic energy K can be checked to be 


M = hwii + hwi + [30313 
1 44 
K =5 (hwy + hwy + uj) 44] 


and are constants of motion. From Figure 3 it follows 
that 4 — ĝo cosy + Po sin 0o sin Yo, w2 = — Bosin yo + 
QYosinAgcosy and w3=PocosAo+wWo, so that the 
Lagrangian, uninspiring at first, is 


e 1 
r= 7 I; (A cos wo + ġo sin 6o sin wo)? 


+ 5 h(t sin Wo + Yo sin to cos uo)? 


— ge 

T 5 I3(io cos to + 9) [45] 
Angular momentum conservation does not imply 

that the components w are constants because 

i1,12,13 also change with time according to 


ij; — @ Ni, 4=1,2,3 


dt 
Hence, M—0 becomes, by the first of [44] and 
denoting 1@ = (Iw, hw, 13:03), the Euler equations 
la +a /la=)0, or 


hwi = (Db — 13) w2w3 
Iw» = (I; = l )u3u1 [46] 
Bw = (11 — D)wiw»? 


which can be considered together with the conserved 
quantities [44]. 


Since angular momentum is conserved, it is con- 
venient to introduce the laboratory frame (O;xo, 
yos £o) with fixed axes xo, yo, zo and (see Figure 4): 


1. (O; x, y, z), the momentum frame with fixed axes, 
but with z-axis oriented as M, and x-axis 
coinciding with the node (i.e., the intersection) 
of the xo—yg plane and the x-y plane (orthogonal 
to M). Therefore, x, y, z is determined by the two 
Euler angles C, y of (O;x,y,z) in (O; xo, yo, zo); 
2. (O;1,2,3), the comoving frame, that is, the 
frame fixed with the body, and with unit vectors 
i1, i2, 13 parallel to the principal axes of the body. 
The frame is determined by three Euler angles 
90, Pos Yo; 
3. the Euler angles of (O;1,2,3) with respect to 
(O;x,y,z), which are denoted 6, y, v; 

. G, the total angular momentum: G? = ) 757; 

. M3, the angular momentum along the zo axis; 
M3 =G cos 6; and 

6. L, the projection of M on the axis 3, L — G cos 0. 


Cn 小 


The quantities G, M3, L, o, 7, determine 4%, Yo, 
wo and 69,9o,Uo, or the peo,,pw, pu, variables 
conjugated to 6o, %0, Yo as shown by the following 
comment. 

Considering Figure 4, the angles ¢,7 determine 
location, in the fixed frame (O;xo,yo,zo) of the 
direction of M and the node line m, which are, 
respectively, the z-axis and the x-axis of the fixed 
frame associated with the angular momentum; the 
angles 0,4,» then determine the position of the 
comoving frame with respect to the fixed frame 
(O;x,y,z), hence its position with respect to 
(O;xo,yo,zo), that is, (80, sp0, Wo). From this and 
G, it is possible to determine @ because 


13w3 n 

cos 9 一 一 一 ， =a 
G — wy [47] 

wh =152(G? — Rut — Bud) 


and, from [43], 0o, Sos Wo are determined. 


Figure 4 The laboratory frame, the angular momentum frame, 
and the comoving frame (and the Deprit angles). 


The Lagrangian [45] gives immediately (after 
expressing @, i.€., m,2,13, in terms of the Euler 
angles 0o, %0, Wo) an expression for the variables 
Poss Poos Pus conjugated to 0o, (P0, Wo: 


Do, =M. no, Pos =M- Z0, Pus =M. 13 [48] 


and, in principle, we could proceed to compute the 
Hamiltonian. 

However, the computation can be avoided 
because of the very remarkable property (DEPRIT), 
which can be checked with some patience, making 
use of [48] and of elementary spherical trigonometry 
identities, 


M3dy + Gdy + Ld 
= Poo deo + pu, dvo + Pay do [49] 
which means that the map _ ((M3,7),(L,W), 
(G, P)) —9 ((poos 0), (Poos P0)s (Pus; Yo)) is a canoni- 
cal map. And in the new coordinates, the kinetic 
energy, hence the Hamiltonian, takes the form 


A e. p 4. 
ee in (Srt P " [50] 


K25 


13 


This again shows that G,M3 are constants of 
motion, and the L,~ variables are determined by a 
quadrature, because the Hamilton equation for w 
combined with the energy conservation yields 


，2 ee 
dod i5 U^ cos e) 


IE li h 


uer. 2 I 
| (o2[sn'v COS* 1) 
2E G2 (SFe ees 


In the integrability region, this motion is periodic 
with some period T; (E, G). Once (t) is determined, 
the Hamilton equation for y leads to the further 
quadrature 


in? aj 2 o 
,-[Ee "SS e [52] 


which determines a second periodic motion with 
period TG(E, G). The y, M3 are constants and, 
therefore, the motion takes place on  three- 
dimensional invariant tori 7 gc, y, in phase space, 
each of which is “always” foliated into two- 
dimensional invariant tori parametrized by the 
angle y which is constant (by [50], because K is 
M3-independent): the latter are in turn foliated by 
one-dimensional invariant tori, that is, by periodic 
orbits, with E,G such that the value of 
Ti (E, G)/TG(E, G) is rational. 
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Note that if I]; —1;—I, the above analysis is 
extremely simplified. Furthermore, if gravity g acts 
on the system the Hamiltonian will simply change by 
the addition of a potential —mgz if z is the height of 
the center of mass. Then (see Figure 4), if the center 
of mass of the body is on the axis i; and z= ^ cos Oo, 
and hb is the distance of the center of mass from O, 
since cos ło = cos cos C — sin @ sin C cos y, the Hamil- 
tonian will become H =K — mgh cos ĝo or 


M3 M? 1/2 
c) 


[2 1/2 
x (1 一 a) cos e) [53] 


so that, again, the system is integrable by quadratures 
(with the roles of w and y “interchanged” with respect 
to the previous case) in suitable regions of phase space. 
This is called the Lagrange's gyroscope. 

A less elementary integrable case is when the 
inertia moments are related as Iı = h — 21; and the 
center of mass is in the ¿i—i plane (rather than on 
the i3-axis) and only gravity acts, besides the 
constraint force on the pivot point O; this is called 
Kowalevskaia’s gyroscope. 

For more details, see Gallavotti (1983). 


e @ C2_12 
w= 一 


Other Quadratures 


An interesting classical integrable motion is that of a 
point mass attracted by two equal-mass centers of 
gravitational attraction, or a point ideally constrained 
to move on the surface of a general ellipsoid. 

New integrable systems have been discovered 
quite recently and have generated a wealth of new 
developments ranging from group theory (as integ- 
rable systems are closely related to symmetries) to 
partial differential equations. 

It is convenient to extend the notion of integ- 
rability by stating that a system is integrable in a 
region W of phase space if 


1. there is a change of coordinates (p,q) € 
W — {A,a,Y,y} € (Ux T^) x (V x R”) where 
U c R^, V c R", with ¢ + m > 1, are opensets; and 

2. the A, Y are constants of motion while the other 
coordinates vary “linearly”: 


(o, > (æ + @(A,Y)t, y - v(A, Y)t) [54] 


where @(A, Y),v(A, Y) are smooth functions. 


In the new sense, the systems studied in the previous 
sections are integrable in much wider regions (essen- 
tially on the entire phase space with the exception of a 
set of data which lie on lower-dimensional surfaces 


14 Introductory Article: Classical Mechanics 


forming sets of zero volume). The notion is con- 
venient also because it allows us to say that even the 
systems of free particles are integrable. 

Two very remarkable systems integrable in the 
new sense are the Hamiltonian systems, respectively 
called Toda lattice (KRuskAL, ZABUSKY), and 
Calogero lattice (CALOGERO, Moser); if (pj, qi) € R?, 
they are 


ia n—| = E 
Hr(p.q)— c2 BEN Re Mid 
i=1 i=] 


OM x2 & —— 
Hc(p.q)— 252^ T 2 T P [55] 


where m>0O and &,w,g > 0. They describe the 
motion of z interacting particles on a line. 

The integration method for the above systems is 
again to find first the constants of motion and later 
to look for quadratures, when appropriate. The 
constants of motion can be found with the method 
of the Lax pairs. One shows that there is a pair of 
self-adjoint n x n matrices M(p, q), N(p, q) such that 
the equations of motion become 


Å M(p. 4) = iM(p.q). N(p. 4). i=v-1 [56 


which imply that M(z) 2 U(t)M(0)U(t) !, with U(t) a 
unitary matrix. When the equations can be written in 
the above form, it is clear that the n eigenvalues of the 
matrix M(0)— M(ps,qo) are constants of motion. 
When appropriate (e.g., in the Calogero lattice case 
with w > 0), it is possible to proceed to find canonical 
action-angle coordinates: a task that is quite difficult 
due to the arbitrariness of n, but which is possible. 

The Lax pairs for the Calogero lattice (with 
w=0, gm 1) are 


Ny, = 0 
' Nope - xh 天 k 57 
(qp — dk) > 


while for the Toda lattice (with m=g= 5 = 1), the 
nonzero matrix elements of M, N are 


My, = Ph, 


mh S 
"8 (db — qx) 


My, = Dy, My pii = My; 4p = e (9 dra) 


58 
Np paa = -Nph — ie (Iah) 


which are checked by first trying the case »— 2. 
Another integrable system (SUTHERLAND) is 


n 


1 < g 
Hs(p,4) ==— > Pi-».—3G———. [59 


whose Lax pair is related to that of the Calogero 
lattice. 

By taking suitable limits as 1 一 co and as the 
other parameters tend to 0 or oo at suitable rates, 
integrability of a few differential equations, among 
which the Korteweg-deVries equation or the non- 
linear Schrédinger equation, can be derived. 

As mentioned in the introductory section, sym- 
metry properties under continuous groups imply 
existence of constants of motion. Hence, it is natural 
to think that integrability of a mechanical system 
reflects enough symmetry to imply the existence of 
as many constants of motion, independent and in 
involution, as the number of degrees of freedom, n. 

This is in fact always true, and in some respects it 
is a tautological statement in the anisochronous 
cases. Integrability in a region W implies existence 
of canonical action-angle coordinates (A, a) (see the 
section *Quasiperiodicity and integrability") and the 
Hamiltonian depends solely on the A's: therefore, its 
restriction to W is invariant with respect to the 
action of the continuous commutative group 7” of 
the translations of the angle variables. The actions 
can be seen as constants of motion whose existence 
follows from Noether's theorem, at least in the 
anisochronous cases in which the Hamiltonian 
formulation is equivalent to a Lagrangian one. 

What is nontrivial is to recognize, prior to 
realizing integrability, that a system admits this 
kind of symmetry: in most of the interesting cases, 
the systems either do not exhibit obvious symmetries 
or they exhibit symmetries apparently unrelated to 
the group 7", which nevertheless imply existence of 
sufficiently many independent constants of motion 
as required for integrability. Hence, nontrivial 
integrable systems possess a “hidden” symmetry 
under 7”: the rigid body is an example. 

However, very often the symmetries of a Hamiltonian 
H which imply integrability also imply partial 
isochrony, that is, they imply that the number of 
independent frequencies is smaller than 7 (see the 
section “Quasiperiodicity and integrability”). Even 
in such cases, often a map exists from the original 
coordinates (p,q) to the integrating variables (A, œ) 
in which A are constants of motion and the @ are 
uniformly rotating angles (some of which are also 
constant) with spectrum @(A), which is the gradient 
Oab(A) for some function H(A) depending only on a 
few of the A coordinates. However, the map might 
fail to be canonical. The system is then said to be 
bi-Hamiltonian: in the sense that one can represent 
motions in two systems of canonical coordinates, 
not related by a canonical transformation, and by 
two Hamiltonian functions H and H' zh which 
generate the same motions in the respective 
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coordinates (the latter changes of variables are 
sometimes called “canonical with respect to the 
pair H, H’” while the transformations considered in 
the section *Canonical transformations of phase 
space coordination” are called completely 
canonical). 

For more details, we refer the reader to Calogero 
and Degasperis (1982). 


Generic Nonintegrability 


It is natural to try to prove that a system "close" to 
an integrable one has motions with properties very 
close to quasiperiodic. This is indeed the case, but in 
a rather subtle way. That there is a problem is easily 
seen in the case of a perturbation of an anisochro- 
nous integrable system. 

Assume that a system is integrable in a region W 
of phase space which, in the integrating action-angle 
variables (A, a), has the standard form U x T‘ with 
a Hamiltonian (A) with gradient @(A) = O,h(A). If 
the forces are perturbed by a potential which is 
smooth then the new system will be described, in the 
same coordinates, by a Hamiltonian like 

H.(A, a) — b(A) + ef (A, a) [60] 
with þh, f analytic in the variables A, a. 

If the system really behaved like the unperturbed 
one, it ought to have / constants of motion of the 
form F.(A,@) analytic in € near £ —0 and uniform, 
that is, single valued (which is the same as periodic) 
in the variables à. However, the following theorem 
(PomNCARÉ) shows that this is a somewhat unlikely 
possibility. 


Theorem 1 If the matrix 9^ Ab(A) has rank >2, the 
Hamiltonian [60] “generically” (an intuitive notion 
precised below) cannot be integrated by a canonical 
transformation C.(A, a) which 


(i) reduces to the identity as € —^ 0; and 
(ii) is analytic in € near £—0 and in (A,a)€ 
U' x T“, with U' C U open. 


Furthermore, no uniform constants of motion F.(A, a), 
defined for £ near 0 and (A, œ) in an open domain U' x 
T“, exist other than the functions of H- itself. 


Integrability in the sense (i), (ii) can be called 
analytic integrability and it is the strongest (and 
most naive) sense that can be given to the attribute. 

The first part of the theorem, that is, (i), (ii), holds 
simply because, if integrability was assumed, a 
generating function of the integrating map would 
have the form A’. æ + 6.(A', a) with ® admitting a 


power series expansion in £ as 6. = eo! + £?9? + 
Hence, ©! would have to satisfy 

aA) -PaP (Aa) + f(A;a) = f(A’) [61] 
where f(A’) depends only on A’ (hence integrating 
both sides with respect to œ, it appears that f(A’) 


must coincide with the average of f(A', a) over a). 
This implies that the Fourier transform f,(A) 


v € Z/, should satisfy 
X &(A)-20  ifo(A)-v-0, v£0 [62 


which is equivalent to the existence of f,(A') such that 
f, (A) — eA) - vf, (A) for v 4 0. But since there is no 
relation between @(A) and f(A,a), this property 
“generically” will not hold in the sense that as close 
as wished to an f which satisfies the property [62] there 
will be another f which does not satisfy it essentially no 
matter how “closeness” is defined, (e.g., with respect to 
the metric ||f — g||= »» Ifv(A) — gy(A)||). This is so 
because the rank of 2 ^AP(A) is higher than 1 and @(A) 
varies at least on a two-dimensional surface, so that 
QV —0 becomes certainly possible for some v Æ 0 
while f,(A) in general will not vanish, so that !, 
hence ®., does not exist. 

This means that close to a function f there is a 
function f" which violates [62] for some v. Of course, 
this depends on what is meant by “close”: however, 
here essentially any topology introduced on the 
space of the functions f will make the statement 
correct. For instance, if the distance between two 
functions is defined by 5°, supycy |fv(A) — gv(A)| or 
by sup A alf (A, œ) — g(A, a)|. 

The idea behind the last statement of the theorem 
is in essence the same: consider, for simplicity, the 
anisochronous case in which the matrix 9^ Ab(A) 
has maximal rank /, that is, the determinant 
det 04,,h(A) does not vanish. Anisochrony implies 
that @(A)-v Æ 0 for all v Z 0 and A on a dense set, 
and this property will be used repeatedly in the 
following analysis. 

Let B(e, A, à) be a “uniform” constant of motion, 
meaning that it is single valued and analytic in the 
non-simply-connected region U x T“ and, for € small, 


B(e, A, a) = Bo(A, a) + eB (A, a) 
FEBA a) +>: [63] 
The condition that B is a constant of motion can be 


written order by order in its expansion in e: the first 
two orders are 


@(A) - O4Bo(A, a) = 0 
OAf (A, a) i Og Bo(A, æ) E af (A, a) ` ða Bo (A, æ) [64] 
+ @(A) - 0gBi(A,a@) = 0 
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Then the above two relations and anisochrony imply 
(1) that By must be a function of A only and (2) that 
Q(A) -v and Q4Bo(A) : v vanish simultaneously for all 
v. Hence, the gradient of By must be proportional to 
Q(A), that is, to the gradient of 5(A):9ABo(A) — 
A(A)GAP(A). Therefore, generically (because of the 
anisochrony) it must be that Bo depends on A 
through 5(A) : Bo(A) = F(b(A)) for some F. 

Looking again, with the new information, at the 
second of [64] it follows that at fixed A the 
a-derivative in the direction @(A) of B, equals 
F(b(A) times the a-derivative of f, that is, 

Summarizing: thé constant of motion B has been 
written as  B(A,a) — F(b(A)) + £F(b(A))f (A, a) + 
sCl(4) 十 sB2 十 … which is equivalent to 
B(A, a) = F(H-) + «(Bg + eB, 十 …) and therefore 

0 十 EB1 十 .…… is another analytic constant of 
motion. Repeating the argument also Bj + cB, +++- 
must have the form F,(H-) + ¢(Bj 4- eB +); 
conclusion 


B — F(H.) 十 <sFi(7te) + F(H Hee 


= 


Fe" F.H) + Ole) [65] 


By analyticity, B=F.(H-(A,@)) for some F.: hence 
generically all constants of motion are trivial. 

Therefore, a system close to integrable cannot 
behave as it would naively be expected. The 
problem, however, was not manifest until Por- 
CARÉ's proof of the above results: because in most 
applications the function f has only finitely many 
Fourier components, or at least is replaced by an 
approximation with this property, so that at least 
[62] and even a few of the higher-order constraints 
like [64] become possible in open regions of action 
space. In fact, it may happen that the values of A of 
interest are restricted so that @(A)-v=O only for 
"large" values of v for which f, — 0. Nevertheless, 
the property that f,(A)=(@(A)-v)f,(A) (or the 
analogous higher-order conditions, e.g., [64]), 
which we have seen to be necessary for analytic 
integrability of the perturbed system, can be 
checked to fail in important problems, if no 
approximation is made on f. Hence a conceptual 
problem arises. 

For more details see Poincaré (1987). 


Perturbing Functions 


To check, in a given problem, the nonexistence of 
nontrivial constants of motion along the lines 
indicated in the previous section, it is necessary to 
express the potential, usually given in Cartesian 


coordinates as eV(x), in terms of the action-angle 
variables of the unperturbed, integrable, system. 

In particular, the problem arises when trying to 
check nonexistence of nontrivial constants of 
motion when the anisochrony assumption (cf. the 
previous section) is not satisfied. Usually it 
becomes satisfied “to second order” (or higher): 
but to show this, a more detailed information on 
the structure of the perturbing function expressed 
in action-angle variables is needed. For instance, 
this is often necessary even when the perturbation 
is approximated by a trigonometric polynomial, as 
it is essentially always the case in celestial 
mechanics. 

Finding explicit expressions for the action—angle 
variables is in itself a rather nontrivial task which 
leads to many problems of intrinsic interest even in 
seemingly simple cases. For instance, in the case of 
the planar gravitational central motion, the Kepler 
equation À — €—esin€ (see the first of [41]) must be 
solved expressing € in terms of A (see the first of 
[42]). It is obvious that for small e, the variable € 
can be expressed as an analytic function of e: 
nevertheless, the actual construction of this expres- 
sion leads to several problems. For small e, an 
interesting algorithm is the following. 

Let bp(A) =£ — A, so that the equation to solve (i.e., 
the first of [41]) is 


b(A) = esin(A + b(A)) 
Oc 
= 2) À b 入 
SS EB) (66 
where c(A) = cos à; the function 入 — (A) should be 
periodic in A, with period 27, and analytic in £, À for 
e small and A real. If b(A) 2 ep! + £? p? +---, the 
Fourier transform of b/&(A) satisfies the recursion 
relation 


MBL-A— (iv) En (ivo)? 
p=1 p: ky eek =k~ 1 
py ty dry v 
x PR, ki [67] 


with c, the Fourier transform of the cosine (c+; = 5, 
c,—0 if v Z X1), and (of course) bU = —irc,. 
Equation [67] is obtained by expanding the RHS 
of [66] in powers of h and then taking the Fourier 
transform of both sides retaining only terms of order 
k ine. 

Iterating the above relation, imagine drawing all 
trees 0 with k “branches,” or “lines,” distinguished 
by a label taking k values, and k nodes and attach to 
each node v a harmonic label v, = +1 as in Figure 5. 
The trees will be assumed to start with a root line vr 
linking a point r and the “first node" v (see Figure 5) 


Z1 
Vo Vg 


vo 
Va Vg 


v10 


Figure 5 An example of a tree graph and its labels. It contains 
only one simple node (3). Harmonics are indicated next to their 
nodes. Labels distinguishing lines are not marked. 


and then bifurcate arbitrarily (such trees are some- 
times called “rooted trees”). 

Imagine the tree oriented from the endpoints 
towards the root r (not to be considered a node) 
and given a node v call v’ the node immediately 
following it. If v is the first node before the root r, 
let v =r and v,,=1. For each such decorated tree 
define its numerical value 


val) = II om) Ten (68) 


' lines /=v'v nodes 
and define a current v(/) on a line | —v'v to be the 
sum of the harmonics of the nodes preceding 
v': v(l) = Sc, v». Call v(@) the current flowing in 
the root branch and call order of 0 the number of 
nodes (or branches). Then 
ht) = > Val(9) [69] 


2 
provided trees are considered identical if they can be 
overlapped (labels included) after suitably scaling 
the lengths of their branches and pivoting them 
around the nodes out of which they emerge (the root 
is always imagined to be fixed at the origin). 

If the trees are stripped of the harmonic labels, 
their number is finite and it can be estimated to be 
< kM* (because the labels which distinguish the lines 
can be attached to an unlabeled tree in many ways). 
The harmonic labels (i.e., w= +1) can be laid 
down in 2* ways, and the value of each tree can be 
bounded by (1/k!)2~* (because cx; = 1). 

Hence Y2,|5/?| < 4*, which gives a (rough) 
estimate of the radius of convergence of the 
expansion of / in powers of s: namely 0.25 (easily 
improvable to 0.3678 if 4*k! is replaced by k*-! 
using Cayley's formula for the enumeration of 
rooted trees). A simple expression for h'*)(w) 
(LAGRANGE) Is 


1 


p/ (ap) = oy sin’ v 


~ 
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(also readable from the tree representation): the 
actual radius of convergence, first determined by 
Laplace, of the series for h can also be determined 
from the latter expression for  (RoucHE) or directly 
from the tree representation: it is «0.6627. 

One can find better estimates or at least more 
efficient methods for evaluating the sums in [69]: 
in fact, in performing the sum in [69] important 
cancellations occur. For instance, the harmonic 
labels can be subject to the further strong constraint 
that no line carries zero current because the 
sum of the values of the trees of fixed order and 
with at least one line carrying zero current 
vanishes. 

The above expansion can also be simplified by 
partial resummations. For the purpose of an 
example, let the nodes with one entering and one 
exiting line (see Figure 5) be called as “simple” 
nodes. Then all tree graphs which, on any line 
between two nonsimple nodes, contain any number 
of simple nodes can be eliminated. This is done by 
replacing, in evaluating the (remaining) tree values, 
the factors vyv, in [68] by v;w,/(1— ecosw): then 
the value of @ (denoted Val(@),,) for a tree becomes a 
function of «» and £ and [69] is replaced by 


hlp) = > >》 &e""val(9), [70] 


k=1 985)» 
order(8) -& 

where the « means that the trees are subject to the 
further restriction of not containing any simple 
node. It should be noted that the above graphical 
representation of the solution of the Kepler equation 
is strongly reminiscent of the representations of 
quantities in terms of graphs that occur often in 
quantum field theory. Here the trees correspond to 
Feynman grapbs, the factors associated with the 
nodes are the couplings, the factors associated with 
the lines are the propagators, and the resummations 
are analogous to the self-energy resummations, 
while the cancellations mentioned above can be 
related to the class of identities called Ward 
identities. Not only the analogy can be shown not 
to be superficial, but it also turns out to be very 
helpful in key mechanical problems: see Appendix 1. 

The existence of a vast number of identities 
relating the tree values is shown already by the 
simple form of the Lagrange series and by the 
even more remarkable resummation (LEVI-CIVITA) 
leading to 


oo 


h = (c sin v) ( 1 9 ) T 71 
de x k! 1 一 ecos 劝 一 En 


k=1 
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It is even possible to further collect the series 
terms to express it as a series with much better 
convergence properties; for instance, its terms can be 
reorganized and collected (resummed) so that h is 
expressed as a power series in the parameter 


M, c 
fev 


with radius of convergence 1, which corresponds to 
€=1 (via a simple argument by Levi-Civita). The 
analyticity domain for the Lagrange series is |n| < 1. 
This also determiries the value of Laplace radius, 
which is the point` closest to the origin of the 
complex curve |7(¢)|= 1: it is imaginary so that it is 
the root of the equation 


gel ** J(1 sa af] +e?) =i 


The analysis provides an example, in a simple 
case of great interest in applications, of the kind of 
computations actually necessary to represent the 
perturbing function in terms of action-angle 
variables. The property that the function c(A) in 
[66] is the cosine has been used only to limit the 
range of the label v to be +1; hence the same 
method, with similar results, can be applied to 
study the inversion of the relation between the 
average anomaly A and the true anomaly 0 and to 
efficiently obtain, for instance, the properties of 
f, g in [42]. 

For more details, the reader is referred to Levi- 
Civita (1956). 


[72] 


Lindstedt and Birkhoff Series: 
Divergences 


Nonexistence of constants of motion, rather than 
being the end of the attempts to study motions close 
to integrable ones by perturbation methods, marks 
the beginning of renewed efforts to understand their 
nature. 

Let (A,a) € Ux T. be action-angle variables 
defined in the integrability region for an analytic 
Hamiltonian and let bp(A) be its value in the action- 
angle coordinates. Suppose that 5(Ag) is anisochro- 
nous and let f(A,@) be an analytic perturbing 
function. Consider, for e small, the Hamiltonian 
H.(A, a) = Ho(A) + ef (A, a). 

Let 9o = @(Ao) 三 O47to(A) be the frequency spec- 
trum (see the section “Quasiperiodicity and integ- 
rability") of one of the invariant tori of the 
unperturbed system corresponding to an action Ao. 
Short of integrability, the question to ask at this 
point is whether the perturbed system admits an 


analytic invariant torus on which the motion is 
quasiperiodic and 


1. has the same spectrum (9, 

2. depends analytically on € at least for £ small, 

3. reduces to the “unperturbed torus" {Ag} x T^ as 
e— 0. 


More concretely, the question is: 


Are there functions H.(w),b.(w) analytic in y ET 
and in £ near 0, vanishing as = 一 0 and such that the 
torus with parametric equations 
A=Aj+H.(w), a=yw+hi(w), weT* [73 
is invariant and, if à S (As), the motion on it is 
simply w— W +@ot, i.e. it is quasiperiodic with 
spectrum (9o? 


In this context, Poincaré's theorem (in the section 
“Generic nonintegrability") had followed another 
key result, earlier developed in particular cases and 
completed by him, which provides a partial answer 
to the question. 

Suppose that 9 = @(Ao) € R' satisfies a Diophan- 
tine property, namely suppose that there exist 
constants C, 7 > 0 such that 


lo: V| > = for all 0 Z v eZ [74] 


Clv| 


which, for each 7»/£-—1 fixed, is a property 
enjoyed by all æ € R' but for a set of zero measure. 
Then the motions on the unperturbed torus run over 
trajectories that fill the torus densely because of the 
“irrationality?” of «o implied by [74]. Writing 
Hamilton's equations, 


à = O4Ho(A) + edaf(A,ar), A= —eðaf (A, æ) 


with A,@ given by [73] with y replaced by y + ot, 
and using the density of the unperturbed trajectories 
implied by [74], the condition that |73] are 
equations for an invariant torus on which the 
motion is V — W + @ot are 


Qo + (@ -9y)b.(w) =AaHo(Ao + H.(y)) 
T eg af (Ao T H.(v), y h-(W))(@o ‘Oy)H-(W) 
= —€0af (Ao + H.(w), v +h.(y)) [75] 


The theorem referred to above (PoOINCARE) is that 


Theorem 2 If the unperturbed system is anisocbro- 
nous and @ = @(Apo) satisfies [74] for some C,T > 0 
there exist two well defined power series b.(y) — 


se, eth" (yw) and HAw) = > , eHR (y) which 


solve [75] to all orders in £. The series for H. is 
uniquely determined, and such is also the series for 
b. up to the addition of an arbitrary constant at each 
order, so that it is unique if b. is required, as 
henceforth done with no loss of generality, to have 
zero average over y. 


The algorithm for the construction is illustrated in 
a simple case in the next section (see eqns [83], 
[84]. Convergence of the above series, called 
Lindstedt series, even for small £ has been a problem 
for rather a long time. Poincaré proved the existence 
of the formal solution; but his other result, discussed 
in the section “Generic nonintegrability," casts 
doubts on convergence although it does not exclude 
it, as was immediately stressed by several authors 
(including Poincaré himself). The result in that 
section shows the impossibility of solving [75] for 
all @o’s near a given spectrum, analytically and 
uniformly, but it does not exclude the possibility of 
solving it for a single @o. 

The theorem admits several extensions or analogs: 
an interesting one is to the case of isochronous 
unperturbed systems: 


Given the Hamiltonian H,(A,@)=@p-A-+ ef(A,@), 
with wo satisfying [74] and f analytic, there exist 
power series C.(A', a), u.(A') such that ?1.(C. (A, a')) = 
Q0: À -- u.(A') holds as an equality between formal 
power series (ie., order by order in £) and at the 
same time the C., regarded as a map, satisfies order by 
order the condition (i.e., (4.3)) that it is a canonical map. 


This means that there is a generating function 
A’. a 4- $.(A',a) also defined by a formal power 
series D(A a) = Yt FOA, a), that is, such 
that if C.(A',a') - (A,a) then it is true, order by 
order in powers of e, that A— A' + 945. (A', a) and 
a’ =Q 4- 94/.(A', æ). The series for ®-, u- are called 
Birkhoff series. 

In this isochronous case, if Birkhoff series were 
convergent for small s and (A', a) in a region of the 
form U x T^ with U c R'- open and bounded, it 
would follow that, for small <, He would be inte- 
grable in a large region of phase space (i.e., where the 
generating function can be used to build a canonical 
map: this would essentially be U x T“ deprived of a 
small layer of points near the boundary of U). 
However, convergence for small e is false (in general), 
as shown by the simple two-dimensional example 


H-(A,@) = @ -A + € (A2 + f(@)) 


76 
(A æ) € R? x T? US 


with f(@) an arbitrary analytic function with all 
Fourier coefficients f, positive for v Z 0 and f, — 0. 
In the latter case, the solution is 


Introductory Article: Classical Mechanics 19 


u.(A’) = £A? 
®.(A’,@) = 


Soc * f e 


k=1 — OzvcZ? 


(iv2)* 


[77] 
(1(wo1Z1 十 won 


jr 


The series does not converge: in fact, its convergence 
would imply integrability and, consequently, 
bounded trajectories in phase space: however, the 
equations of motion for [76] can be easily solved 
explicitly and in any open region near given initial 
data there are other data which have unbounded 
trajectories if wo; /(wo2 + €) is rational. 

Nevertheless, even in this elementary case a 
formal sum of the series yields 


u(A’) = £A 


f, e?" [78] 


®.(A’,a@) =€ 
i(wo1Z1 + (w20 + £)v2) 


0zvcz 


and the series in [78] (no longer a power series in £) 
is really convergent if @ = (w91,0w02 +£) is a Dio- 
phantine vector (by [74], because analyticity implies 
exponential decay of |fy|). Remarkably, for such 
values of £ the Hamiltonian H- is integrable and it is 
integrated by the canonical map generated by [78], 
in spite of the fact that [78] is obtained, from [77], 
via the nonrigorous sum rule 


= 1 

> = forz#1 [79] 
k=0 is. 

(applied to cases with |z| > 1, which are certainly 
realized for a dense set of e's even if @ is Diophantine 
because the z's have values z= mr /@o - V). In other 
words, the integration of the equations is elementary 
and once performed it becomes apparent that, if @ is 
diophantine, the solutions can be rigorously found 
from [78]. Note that, for instance, this means that 
relations like 377. 2^ = —1 are really used to obtain 
[78] from [77]. 

Another extension of Lindstedt series arises in a 
perturbation of an anisochronous system when 
asking the question as to what happens to the 
unperturbed invariant tori Zæ, on which the spec- 
trum is resonant, that is, @ọ - V — 0 for some v Æ 0, 
v € Z’. The result is that even in such a case there is a 
formal power series solution showing that at least 
a few of the (infinitely many) invariant tori into 
which 7 5, is in turn foliated in the unperturbed case 
can be formally continued at € Z 0 (see the section 
“Resonances and their stability"). 


For more details, we refer the reader to Poincaré 
(1987). 
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Quasiperiodicity and KAM Stability 


To discuss more advanced results, it is convenient 
to restrict attention to a special (nontrivial) para- 
digmatic case 


^. (A, a) = 14^ + ef (a) [80] 


In this simple case (called Thirring model: represent- 
ing £ particles on a circle interacting via a potential 
sf(w)) the equations for the maximal tori [75] 
reduce to equations for the only functions b.: 


(@-dy) b.(w) = —Baf(y -b.(w). yeT [81] 


as the second of [75] simply becomes the definition 
of H. because the RHS does not involve H.. 

The real problem is therefore whether the formal 
series considered in the last section converge at least 
for small e: and the example [76] on the Birkhoff 
series shows that sometimes sum rules might be 
needed in order to give a meaning to the series. In 
fact, whenever a problem (of physical interest) 
admits a formal power series solution which is not 
convergent, or which is such that it is not known 
whether it is convergent, then one should look for 
sum rules for it. 

The modern theory of perturbations starts with 
the proof of the convergence for £ small enough of 
the Lindstedt series (Kot Moconov). The general 
“KAM” result is: 


Theorem 3 (KAM) Consider tbe Hamiltonian 
H.(A,a@)=h(A) +ef(A,@), defined in U—V x T 
with V C R! open and bounded and with f(A, æ), 
h(A) analytic in the closure V x T" where b(A) is also 
anisochronous; let Wo def (Ao) = OAb(Ao) and assume 
that @ satisfies |74]. Then 


(i) there is Ecr > 0 such that the Lindstedt series 

converges for |g| < Esc 有 

(ii) its sum yields two function H.(yw),b.(w) on T° 
which parametrize an invariant torus 
T c,7(Ao, £); 

(iii) on 7 c..(Ao,£) the motion is V — Y + Wot, see 
[73]; and 

(iv) the set of data in U which belong to invariant 
tori Toer(Ao,e) with @(Ao) satisfying [74] 
with prefixed C,T bas complement with volume 
«const C™ for a suitable a > 0 and with area 
also «const C^ on each nontrivial surface of 
constant energy 'K. — E. 


In other words, for small & the spectra of most 
unperturbed quasiperiodic motions can still be found 
as spectra of perturbed quasiperiodic motions devel- 
oping on tori which are close to the corresponding 
unperturbed ones (i.e., with the same spectrum). 


This is a stability result: for instance, in systems 
with two degrees of freedom the invariant tori of 
dimension two which lie on a given three-dimensional 
energy surface, will separate the points on the energy 
surface into the set which is *inside" the torus and the 
set which is “outside.” Hence, an initial datum 
starting (say) inside cannot reach the outside. Like- 
wise, a point starting between two tori has to stay in 
between forever. Further, if the two tori are close, this 
means that motion will stay very localized in action 
space, with a trajectory accessing only points close to 
the tori and coming close to all such points, within a 
distance of the order of the distance between the 
confining tori. The case of three or more degrees of 
freedom is quite different (see sections “Diffusion in 
phase space" and *The three-body problem"). 

In the simple case of the rotators system [80] the 
equations for the parametric representation of the 
tori are given by [81]. The latter bear some analogy 
with the easier problem in [66]: but [81] are £ 
equations instead of one and they are differential 
equations rather than ordinary equations. Further- 
more, the function f(a) which plays here the role of 
c(A) in [66] has Fourier coefficient f, with no 
restrictions on V, while the Fourier coefficients c, 
for c in [66] do not vanish only for y= +1. 

The above differences are, to some extent, 
*minor" and the power series solution to [81] can 
be constructed by the same algorithm as used in the 
case of [66]: namely one forms trees as in Figure 5 
with the harmonic labels v, € Z replaced by v, € Z/ 
(still to be thought of as possible harmonic indices in 
the Fourier expansion of the perturbing function f). 
All other labels affixed to the trees in the section 
“Generic nonintegrability” will be the same. In 
particular, the current flowing on a branch /=v'v 
will be defined as the sum of the harmonics of the 
nodes w < v preceding v: 


v) V Yo [82] 


wv 


and we call v(0) the current flowing in the root 
branch. 

Here the value Val(0) of a tree has to be defined 
differently because the equation to be solved ([81]) 
contains the differential operator (@o du) which, 
when Fourier transformed, becomes multiplication 
of the Fourier component with harmonic v by 
(i - v^. 

The variation due to the presence of the operator 
(Mo Oy) and the necessity of its inversion in the 
evaluation of u- bi^, that is, of the component of 
p? along an arbitrary unit vector z, is nevertheless 
quite simple: the value of a tree graph 0 of order k 


(i.e., with k nodes and k branches) has to be defined 
by (cf. [68]) 


"--— 
val(g) Ag RM 
dum LL os 


X | lI fa) [83] 


nodes v 


where the vy appearing in the factor relative to the 
root line rv from the first node v to the root r (see 
Figure 5) is interpreted as a unit vector u (it was 
interpreted as 1 in the one-dimensional case [66]). 
Equation [83] makes sense only for trees in which 
no line carries zero current. Then the component 
along u (the harmonic label attached to the root of a 
tree) of p^ is given (see also [69]) by 


u- h” = J Val(0) [84] 


0. v(0)—v 
order(8) —k 


where the * means that the sum is only over trees in 
which a nonzero current v(/) flows on the lines / € 8. 
The quantity  - po will be defined to be 0 (see the 
previous section). 

In the case of [66] zero-current lines could appear: 
but the contributions from tree graphs containing at 
least one zero current line would cancel. In the 
present case, the statement that the above algorithm 
actually gives p^ by simply ignoring trees with lines 
with zero current is nontrivial. It was Poincaré's 
contribution to the theory of Lindstedt series to show 
that even in the general case (cf. [75]) the equations 
for the invariant tori can be solved by a formal power 
series. Equation [84] is proved by induction on k after 
checking it for the first few orders. 

The algorithm just described leading to [83] can 
be extended to the case of the general Hamiltonian 
considered in the KAM theorem. 

The convergence proof is more delicate than the 
(elementary) one for eqn [66]. In fact, the values of 
trees of order k can give large contributions to p^. 
because the “new” factors (@o - v(/))*, although not 
zero, can be quite small and their small size can 
overwhelm the smallness of the factors fy and e. In 
fact, even if f is a trigonometric polynomial (so that f, 
vanishes identically for |v| large enough) the currents 
flowing in the branches can be very large, of the 
order of the number k of nodes in the tree; see [82]. 

This is called the small-divisors problem. The key 
to its solution goes back to a related work (SIEGEL) 
which shows that 


Theorem 4 Consider tbe contribution to the sum 
in [82] from graphs 0 in which no pairs of lines 
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which lie on the same path to the root carry the 
same current and, furthermore, the node harmonics 
are bounded by |v| <N for some N. Then the 
number of lines £ in 0 with divisor @o - Ve satisfying 
2^" < Cloo - vi| € 2^"*! does not exceed 4Nk27"/7, 


Hence, setting 


F € C'maxy|«w|fv| 
the corresponding Val(0) can be bounded by 
1 


k! 


k wx 12k v 2n(ANk2-"/*) def 1 k 
FEN II? Tu 
n=0 [85] 


B= FN?2Y ` $n2;"/* 


since the product is convergent. In the case in which 
f is a trigonometric polynomial of degree N, the 
above restricted contributions to u-h\ would 
generate a convergent series for e small enough. In 
fact, the number of trees is bounded (as in the 
section “Perturbing functions") by k!4*(2N 十 1)* so 
that the series 57, el^ |u - (9| would converge for 
small £ (i.e., |e] < (B- 4(2N + 15, 

Given this comment, the analysis of the “remain- 
ing contributions" becomes the real problem, and it 
requires new ideas because among the excluded trees 
there are some simple kth order trees whose value 
alone, if considered separately from the other 
contributions, would generate a factorially divergent 
power series in £. 

However, the contributions of all large-valued 
trees of order k can be shown to cancel: although 
not exactly (unlike the case of the elementary 
problem in the section “Perturbing functions," 
where the cancellation is not necessary for the 
proof, in spite of its exact occurrence), but enough 
so that in spite of the existence of exceedingly large 
values of individual tree graphs their total sum can 
still be bounded by a constant to the power k so that 
the power series actually converges for & small 
enough. The idea is discussed in Appendix 1. 

For more details, the reader is referred to Poincaré 
(1987), Kolmogorov (1954), Moser (1962), and Arnol'd 
(1989). 


Resonances and their Stability 


A quasiperiodic motion with r rationally indepen- 
dent frequencies is called resonant if r is strictly less 
than the number of degrees of freedom, /. The 
difference s= £ — r is the degree of the resonance. 

Of particular interest are the cases of a perturba- 
tion of an integrable system in which resonant 
motions take place. 
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A typical example is the m-body problem which 
studies the mutual perturbations of the motions of 
n —1 particles gravitating around a more massive 
particle. If the particle masses can be considered to 
be negligible, the system will consist of n — 1 central 
Keplerian motions: it will therefore have / = 3(n — 1) 
degrees of freedom. In general, only one frequency 
per body occurs in the absence of the perturbations 
(the period of the Keplerian orbit). Hence, r < n — 1 
and s > 2(n — 1) (or in the planar case s > (n — 1)) 
with equality holding when the periods are ration- 
ally independent. 

Another example is the rigid body with a fixed 
point perturbed by a conservative force: in this case, 
the unperturbed system has three degrees of freedom 
but, in general, only two frequencies (see the 
discussion following [52]). 

Furthermore, in the above examples there is the 
possibility that the independent frequencies assume, 
for special initial data, values which are rationally 
related, giving rise to resonances of even higher 
order (i.e., with smaller values of 7). 

In an integrable anisochronous system, resonant 
motions will be dense in phase space because the 
frequencies @(A) will vary as much as the actions 
and therefore resonances of any order (i.e, any 
r < f) will be dense in phase space: in particular, the 
periodic motions (i.e., the highest-order resonances) 
will be dense. 

Resonances, in integrable systems, can arise in 
a priori stable integrable systems and in a priori 
unstable systems: the former are systems whose 
Hamiltonian admits canonical action-angle coordi- 
nates (4,0) € U x T^ with U c Rf open, while the 
latter are systems whose Hamiltonian has, in 
suitable local canonical coordinates, the form 


Sj 1 S? 1 
Ho(A)+ 9 -(b?—Xq;) > m+ ), 
2 2 
i] j=l [86] 


Ài, Hj >0 


where (MA,a)eUxT', UER’, (p,q)e V c R^, 
(r,k)c V C R* with V,V' neighborhoods of the 
origin and @=r+s,+52,5;>0,s;+s2>0 and 
+A; n; are called Lyapunov coefficients of 
the resonance. The perturbations considered are 
supposed to have the form ef(A,a@,p,qg,a,K). The 
denomination of a priori stable or unstable refers to 
the properties of the “a priori given unperturbed 
Hamiltonian." The label “a priori unstable" is 
certainly appropriate if s; > 0: here also s;=0 is 
allowed for notational convenience implying that the 
Lyapunov coefficients in a priori unstable cases are all 
of order 1 (whether real A; or imaginary i,/g;). In 


other words, the a priori stable case, sı — 5s? 一 0 in 
[86], is the only excluded case. Of course, the stability 
properties of the motions when a perturbation acts 
will depend on the perturbation in both cases. 

The a priori stable systems usually have a great 
variety of resonances (e.g. in the anisochronous 
case, resonances of any dimension are dense). The 
a priori unstable systems have (among possible other 
resonances) some very special  r-dimensional 
resonances occurring when the unstable coordinates 
(p,q) and (Ax, Kk) are zero and the frequencies of the r 
action-angle coordinates are rationally independent. 

In the first case (a priori stable), the general 
question is whether the resonant motions, which 
form invariant tori of dimension r arranged into 
families that fill /-dimensional invariant tori, con- 
tinue to exist, in presence of small enough perturba- 
tions ef(A,a), on slightly deformed invariant tori. 
Similar questions can be asked in the a priori 
unstable cases. To examine the matter more closely 
consider the formulation of the simplest problems. 

A priori stable resonances: more precisely, suppose 
Ho = 1A? and let {Ao} x T“ be the unperturbed 
invariant torus 74, with spectrum 0o =@(Ao)= 
OAHo(Ao) with only r rationally independent compo- 
nents. For simplicity, suppose that @p =(w,..., 
TN NN def (gy, 0) with @ € R”. The more general 
case in which @ has only r rationally independent 
components can be reduced to the special case above 
by a canonical linear change of coordinates at the price 
of changing the Ho to a new one, still quadratic in the 
actions but containing mixed products A;B;: the proofs 
of the results that are discussed here would not be 
really affected by such more general form of H. 

It is convenient to distinguish between the “fast” 
angles 04,...,0, and the “resonant” angles 
pi1,---,Q¢ (also called “slow” or “secular”) and 
call a=(a@’,B) with a' € T” and B € T°. Likewise, 
we distinguish the fast actions A’ =(A,,...,A,) and 
the resonant ones A,,1,...,À; and set A —(A', B) 
with A’ € R” and B € R5. 

Therefore, the torus 7 4,, Ao = (Aj, Bo), is in turn a 
continuum of invariant tori 74,5 with trivial 
parametric equations: B fixed, œ’ — w,w € T", and 
A'— A5, B— By. On each of them the motion is: 
A', B,B constant and w' — o + ot, with rationally 
independent @ € R’. 

Then the natural question is whether there exist 
functions b., k., H., K. smooth in £ near £ — 0 and in 
y € T", vanishing for € — 0, and such that the torus 
T 4,,B,, With parametric equations 


A'=A)+H.(y), œ — y by), 


T' [87 
B=Bo+K-(w), B=B)+k-(w) d 87 


is invariant for the motions with Hamiltonian 


H.(A, a) = 1A" +1B? + ef (a, B) 
and the motions on it are y — y + ot. The above 
property, when satisfied, is summarized by saying 
that the unperturbed resonant motions 
= (Ao, Bo), @ = (at, + 9t, Bg) can be continued in 
presence of perturbation ef, for small £, to quasiper- 
iodic motions with the same spectrum and on a 
slightly deformed torus T y, 5, .. 

A priori unstable resonances: here the question is 
whether the special invariant tori continue to exist 
in presence of small enough perturbations, of 
course slightly deformed. This means asking 
whether, given Ag such that @(Ap) =0,4Ho0(Ao) has 
rationally independent components, there are func- 
tions (H-(w),b.(w)), (P«(w), Q.(w)) and QI.(v), 
K.(w)) smooth in £ near & — 0, vanishing for & — 0, 
analytic in y € T" and such that the r-dimensional 
surface 


A — Ao +H.(y), 


p = P-(w), q = Q.(v) 
nm =IT-(Y), K = K.(y) 


is an invariant torus 74,. on which the motion is 
W —W --O(Ao)t. Again, the above property is 
summarized by saying that the unperturbed special 
resonant motions can be continued in presence of 
perturbation ef for small £ to quasiperiodic motions 
with the same spectrum and on a slightly deformed 
torus d A es 

Some answers to the above questions are pre- 
sented in the following section. For more details, the 
reader is referred to Gallavotti et al. (2004). 


a= y+ hy) 
y el [88] 


Resonances and Lindstedt Series 


We discuss eqns [87] in the paradigmatic case in 
which the Hamiltonian Ho(A) is 1A? (cf. [80]). It 
will be @(A’) =A’ so that Ao =@, Bo =0 and the 
perturbation f(&) can be considered as a function 
of a = (a, B): let f( B) be defined as its average over 
a’. The determination of the invariant torus of 
dimension r which can be continued in the sense 
discussed in the last section is easily understood in 
this case. 

A resonant invariant torus which, among the tori 
T Asg, has parametric equations that can be con- 
tinued as a formal power series in & is the torus 
T 4, p, With By a stationarity point for f(B), that is, 
an equilibrium point for the average perturbation: 
Ogf (By) — 0. In fact, the following theorem holds: 
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Theorem 5 If WO@ER satisfies a Diopbantine 
property and if By is a nondegenerate stationarity 
point for the “fast angle average" f(B) (i.e., such 
that det 05 af | Bo) 40), then the following cditions 
for tbe raat b. Es 


(@ - Oy) b. (y) — —eOp f(y + b. (w 
(@ Oy) ^h. (w) = 


), Bo T k.(y)) 


[89] 
= —eügf (y +h-(w) +k-(y)) 


can be formally solved in powers of e. 


Given the simplicity of the Hamiltonian [80] that 
we are considering, it is not necessary to discuss the 
functions H.,K- because the equations that they 
should obey reduce to their definitions as in the 
section “Quasiperiodicity and KAM stability,” and 
for the same reason. 

In other words, also the resonant tori admit a 
Lindstedt series representation. It is however very 
unlikely that the series are, in general, convergent. 

Physically, this new aspect is due to the fact that 
the linearization of the motion near the torus 7 4, 5, 
introduces oscillatory motions around 7 y 5, with 
frequencies proportional to the square roots of the 
positive eigenvalues of the matrix E05 af | (Bo): there- 
fore, it is naively expected that it has is be necessary 
that a Diophantine property be required on the 
vector (@, Vel1,...), where ey; are the positive 
eigenvalues. Hence, some values of £, namely those 
for which (@, \/Efi1,.-..) is not a Diophantine vector 
or is too close to a non-Diophantine vector, should 
be excluded or at least should be expected to 
generate difficulties. Note that the problem arises 
irrespective of the assumptions about the nonde- 
generate matrix 02,f(B,) (since e can have either 
sign), and no matter how small |e| is supposed to be. 
But we can expect that if the matrix fet ( Bo) is 
(say) positive definite (i.e., Bo is a minimum point 
for f ( B)) then the problem should be easier for € < 0 
and vice versa, if ff yj is a maximum, it should be 
easier for £ » 0 (ie. in the cases in which the 
eigenvalues of £02 pf | (Bo) are negative and their roots 
do not have the P eeupretation of frequencies). 

Technically, the sums of the formal series can be 
given (so far) a meaning only via summation rules 
involving divergent series: typically, one has to 
identify in the formal expressions (denumerably 
many) geometric series which, although divergent, 
can be given a meaning by applying the rule [79]. 
Since the rule can only be applied if z 4 1, this leads 
to conditions on the parameter £, in order to exclude 
that the various z that have to be considered are very 
close to 1. Hence, this stability result turns out to be 
rather different from the KAM result for the 
maximal tori. Namely the series can be given a 
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meaning via summation rules provided f and fj, 
satisfy certain additional conditions and provided 
certain values of £ are excluded. An example of a 
theorem is the following: 


Theorem 6 Given tbe Hamiltonian [80] and a 
resonant torus T y p, with à — A, € R” satisfying a 
Diophantine property let By be a nondegenerate 
maximum point for the average potential f() 98 
(22) * fw f (a^, B)d'a'. Consider the Lindstedt series 
solution for eqns [89] of tbe perturbed resonant 
torus with spectrum (@,0). It is possible to express 
the single ntb-order term of the series as a sum of 
many terms and tben rearrange tbe series thus 
obtained so that the resummed series converges for 
€ in a domain E which contains a segment [0, £5] and 
also a subset of [一 so,0] which, although with open 
dense complement, is so large that it has 0 as a 
Lebesgue density point. Furthermore, the resummed 
series for b.,k. define an invariant r-dimensional 
analytic torus with spectrum @. 


More generally, if By is only a nondegenerate 
stationarity point for f(), the domain of definition 
of the resummed series is a set E C [—20,£0] which 
on both sides of the origin has an open dense 
complement although it has 0 as a Lebesgue density 
point. 

Theorem 6 can be naturally extended to the 
general case in which the Hamiltonian is the most 
general perturbation of an anisochronous integrable 
system 74. (A, à) — b(A) + ef (A, a) if Bab is a non- 
singular matrix and the resonance arises from a 
spectrum @(Ag) which has r independent compo- 
nents (while the remaining are not necessarily zero). 

We see that the convergence is a delicate problem 
for the Lindstedt series for nearly integrable reso- 
nant motions. They might even be divergent 
(mathematically, a proof of divergence is an open 
problem but it is a very reasonable conjecture in 
view of the above physical interpretation); never- 
theless, Theorem 6 shows that sum rules can be 
given that sometimes (i.e., for £ in a large set near 
€ — 0) yield a true solution to the problem. _ 

This is reminiscent of the phenomenon met in 
discussing perturbations of isochronous systems in 
[76], but it is a much more complex situation. It 
leaves many open problems: foremost among them 
is the question of uniqueness. The sum rules of 
divergent series always contain some arbitrary 
choices, which lead to doubts about the uniqueness 
of the functions parametrizing the invariant tori 
constructed in this way. It might even be that the 
convergence set € may depend upon the arbitrary 
choices, and that considering several of them no € 
with |e| < so is left out. 


The case of a priori unstable systems has also 
been widely studied. In this case too resonances 
with Diophantine r-dimensional spectrum @ are 
considered. However, in the case s; —0 (called a 
priori unstable hyperbolic resonance) the Lindstedt 
series can be shown to be convergent, while in the 
case sı =0 (called a priori unstable elliptic reso- 
nance) or in the mixed cases si,s; > 0 extra 
conditions are needed. They involve @ and 
H= (pis...) (cf. [86]) and properties of the 
perturbations as well. It is also possible to study a 
slightly different problem: namely to look for 
conditions on @,f,f which imply that, for small 
€, invariant tori with spectrum e-dependent but 
close, in a suitable sense, to @ exist. 

The literature is vast, but it seems fair to say that, 
given the above comments, particularly those con- 
cerning uniqueness and analyticity, the situation is still 
quite unsatisfactory. We refer the reader to Gallavotti 
et al. (2004) for more details. 


Diffusion in Phase Space 


The KAM theorem implies that a perturbation of an 
analytic anisochronous integrable system, i.e., with 
an analytic Hamiltonian (A, a) — Ho(A) + 
ef(A,@) and nondegenerate Hessian matrix 
02 AP(A), generates large families of maximal invar- 
iant tori. Such tori lie on the energy surfaces but do 
not have codimension 1 on them, i.e., they do not 
split the (24 — 1)-dimensional energy surfaces into 
disconnected regions except, of course, in the case of 
systems with two degrees of freedom (see the section 
“Quasiperiodicity and KAM stability”). 

Therefore, there might exist trajectories with 
initial data close to A in action space which reach 
phase space points close to A‘ 4 A! in action space 
for € #0, no matter bow small. Such diffusion 
phenomenon would occur in spite of the fact that 
the corresponding trajectory has to move in a space 
in which very close to each {A} x T* there is an 
invariant surface on which points move keeping 
A constant within O(c), which for ¢ small can be 
« |A' — A'|. 

In a priori unstable systems (cf. the section 
“Resonances and their stability”) with s; — 1, 
s2 =0, it is not difficult to see that the correspond- 
ing phenomenon can actually occur: the paradig- 
matic example (ARNOLD) is the a priori unstable 
system 

2 2 
He = f1 4 Ay +E- + g(cosq — 1) 
+e(cosa;+sinaz)(cosg—1) [90] 


This is a system describing a motion of a *pendu- 
lum" ((p, q) coordinates) interacting with a “rotat- 
ing wheel" ((A1,01) coordinates) and a “clock” 
((A2,Q@2) coordinates) a priori unstable near the 
points p=0,q=0,27 (s;=1, 52=0, 1— Vg, 
cf. [86]). It can be proved that on the energy surface 
of energy E and for each e 40 small enough (no 
matter how small) there are initial data with action 
coordinates close to A' = (A1, A5) with (1/2)A? + A) 
close to E eventually evolving to a datum 
A’ — (A4, A5) with A' at a distance from A! smaller 
than an arbitrarily prefixed distance (of course with 
energy E). Furthermore, during the whole process 
the pendulum energy stays close to zero within of5) 
(i.e. the pendulum swings following closely the 
unperturbed separatrices). 

In other words, [90] describes a machine (the 
pendulum) which, working approximately in a 
cycle, extracts energy from a reservoir (the clock) 
to transfer it to a mechanical device (the wheel). The 
statement that diffusion is possible means that the 
machine can work as soon as £ Æ 0, if the initial 
actions and the initial phases (i.e., 1,05, p, q) are 
suitably tuned (as functions of £). 

The peculiarity of the system [90] is that the fixed 
points P. of the unperturbed pendulum (i.e., the 
equilibria p = 0, q = 0,27) remain unstable equilibria 
even when £ £0 and this is an important simplify- 
ing feature. 

It is a peculiarity that permits bypassing the 
obstacle, arising in the analysis of more general 
cases, represented by the resonance surfaces consist- 
ing of the A's with Ai 十 到 三 0: the latter 
correspond to harmonics (ri,v;) present in the 
perturbing function, i.e., the harmonics which 
would lead to division by zero in an attempt to 
construct (as necessary in studying [90] by Arnol'd's 
method) the parametric equations of the perturbed 
invariant tori with action close to such A's. In the 
case of [90] the problem arises only on the 
resonance marked in Figure 6 by a heavy line, i.e., 
A; — 0, corresponding to cosa; in [90]. 

If £ —0, the points P- with p —0, q—0 and the 
point P, with p=0,q=2r are both unstable 
equilibria (and they are, of course, the same point, 
if q is an angular variable). The unstable manifold 
(it is a curve) of P, coincides with the stable 
manifold of P. and vice versa. So that the 
unperturbed system admits nontrivial motions lead- 
ing from P, to P_ and from P_ to P,, both in a bi- 
infinite time interval (一 co,co)j: the p,q variables 
describe a pendulum and P. are its unstable 
equilibria which are connected by the separatrices 
(which constitute the zero-energy surfaces for the 
pendulum). 
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(a) (b) 

Figure 6 (a) The s=0 geometry: the “partial energy" lines are 
parabolas, (1/2)A5 + Ao» = const. The vertical lines are the 
resonances A, -rational (ie., »444-- vos —0). The disks are 
neighborhoods of the points 4 and A! (the dots at their centers). 
(b) £ #0; an artist's rendering of a trajectory in A space, driven 
by the pendulum swings to accelerate the wheel from A; to Aj at 
the expenses of the clock energy, sneaking through invariant tori 
not represented and (approximately) located “away” from the 
intersections between resonances and partial energy lines (a 
dense set, however). The pendulum coordinates are not shown: 
its energy stays close to zero, within a power of «. Hence the 
pendulum swings, staying close to the separatrix. The oscilla- 
tions symbolize the wiggly behavior of the partial energy 
(1/2)A@ + A» in the process of sneaking between invariant tori 
which, because of their invariance, would be impossible without 
the pendulum. The energy (1/2)A% of the wheel increases 
slightly at each pendulum swing: accurate estimates yield an 
increase of the wheel speed A, of the order of </(log =") at 
each swing of the pendulum implying a transition time of the 
order of g /?-^ loge”. 


The latter property remains true for more general 
a priori unstable Hamiltonians 


He= Ho(A) + Hu(p,a) + ef (A, &, p,q) 


in (U x T^) x (R?) P1 


where Hu is a one-dimensional Hamiltonian which 
has two unstable equilibrium points P, and P. 
linearly repulsive in one direction and linearly 
attractive in another which are connected by two 
heteroclinic trajectories which, as time tends to too, 
approach P. and P, and vice versa. 

Actually, the points need not be different but, if 
coinciding, the trajectories linking them must be 
nontrivial: in the case [90] the variable g can be 
considered an angle and then P, and P. would 
coincide (but are connected by nontrivial trajec- 
tories, i.e., by trajectories that also visit points 
different from P+). Such trajectories are called 
heteroclinic if P, Æ P_ and homoclinic if P, — P. . 

In the general case, besides the homoclinicity (or 
heteroclinicity) condition, certain weak genericity 
conditions, automatically satisfied in the example 
[90], have to be imposed in order to show that, 
given A' and Af with the same unperturbed energy 
E, one can find, for all s small enough but not equal 
to zero, initial data (e-dependent) with actions 
arbitrarily close to A which evolve to data with 
actions arbitrarily close to Af. This is a phenomenon 


26 Introductory Article: Classical Mechanics 


called the Arnold diffusion. Simple sufficient con- 
ditions for a transition from near A! to near A! are 
expressed by the following result: 


Theorem 7 Given the Hamiltonian [91] with Hu 
admitting two hyperbolic fixed points Ps with 
heteroclinic connections, t — (palt), qalt)), a— 1,2, 
suppose tbat: 


(i) On the unperturbed energy surface of energy 
E=H(A')+H,(P+) there is a regular curve 
y:s—A(s) joining A to A! such that the 
unperturbed tori {A(s)} x I’ can be continued 
at € £0 into invariant tori T Ais), for a set of 
values of s which fills the curve y leaving only 
gaps of size of order o(e). 

(ii) The £ x £ matrix Dj; of the second derivatives of 
the integral of f over tbe beteroclinic motions is 
not degenerate, tbat is, 


| det D| 


det ( | | dt Onja,f(A, & + @(A)t, 


>c>0 [92] 


p.(t).a.(0)) 


for all A's on the curve y and all a € T°. 


Given arbitrary p > 0, for € 0 small enough 
there are initial data with action and energy closer 
than p to A’ and E, respectively, which after a long 
enougb time acquire an action closer than p to A! 
(keeping tbe initial energy). 


The above two conditions can be shown to hold 
generically for many pairs A'Zz A' (and many 
choices of the curves y connecting them) if the 
number of degrees of freedom is > 3. Thus, the result, 
obtained by a simple extension of the argument 
originally outlined by Arnol'd to discuss the para- 
digmatic example [90], proves the existence of 
diffusion in a priori unstable systems. The integral 
in [92] is called Melnikov integral. 

The real difficulty is to estimate the time needed 
for the transition: it is a time that obviously has to 
diverge as £ — 0. Assuming g fixed (i.e., € indepen- 
dent) a naive approach easily leads to estimates 
which can even be worse than O(exp (ae ^)) with 
some a, b > 0. It has finally been shown that in such 
cases the minimum time can be, for rather general 
perturbations ef(@,g), estimated above by 
O(c! loge), which is the best that can be hoped 
for under generic assumptions. 

The reader is referred to Arnol'd (1989) and 
Chierchia and Valdinoci (2000) for more details. 


Long-Time Stability of Quasiperiodic 
Motions 


A more difficult problem is whether the same 
phenomenon of migration in action space occurs in 
a priori stable systems. The root of the difficulty is a 
remarkable stability property of quasiperiodic 
motions. Consider Hamiltonians H.(A,a@)=h(A) + 
ef (A, a) with Ho(A) = b(A) strictly convex, analytic, 
and anisochronous on the closure U of an open 
bounded region U c R4 and a perturbation ef (A, a) 
analytic in Ux T“. 

Then a priori bounds are available on how long it 
can possibly take to migrate from an action close to 
A, to one close to A): and the bound is of 
"exponential type" as = 一 0 (ie., it admits a lower 
bound which behaves as the exponential of an 
inverse power of =£). The simplest theorem is 
(NEKHOROSSEV): 


Theorem 7 There are constants 0 < a,b,d,g,T 
such that any initial datum (A,a@) evolves so that A 
will not change by more than aes® before a long time 
bounded below by r exp (be^). 


Thus, this puts an exponential bound, i.e., a 
bound exponential in an inverse power of e, to the 
diffusion time: before a time 7 exp (be 7) actions can 
only change by O(e*) so that their variation cannot 
be large no matter how small a + 0 is chosen. This 
places a (long) lower bound to the time of diffusion 
in a priori stable systems. 

The proof of the theorem provides, actually, an 
interesting and detailed picture of the variations in 
actions showing that some actions may vary more 
slowly than others. 

The theorem is constructive, i.e., all constants 
0 <a,b,d,7 can be explicitly chosen and depend 
on £, Ho,f although some of them can be fixed to 
depend only on £ and on the minimum curvature of 
the convex graph of Ho. Its proof can be adapted 
to cover many cases which do not fall in the class of 
systems with strictly convex unperturbed Hamilto- 
nian, and even to cases with a resonant unperturbed 
Hamiltonian. 

However, in important problems (e.g., in the 
three-body problems met in celestial mechanics) 
there is empirical evidence that diffusion takes 
place at a fast pace (i.e., not exponentially slow in 
the above sense) while the above results would 
forbid a rapid migration in phase space if they 
applied: however, in such problems the assumptions 
of the theorem are not satisfied, because the 
unperturbed system is strongly resonant (as in the 
celestial mechanics problems, where the number of 
independent frequencies is a fraction of the number 


of degrees of freedom and P(A) is far from strictly 
convex), leaving wide open the possibility of observ- 
ing rapid diffusion. 

Further, changing the assumptions can dramati- 
cally change the results. For instance, rapid diffusion 
can sometimes be proved even though it might be 
feared that it should require exponentially long 
times: an example that has been proposed is the 
case of a three-timescales system, with Hamiltonian 


2 
w14A1 + w3A5 HP 4 9(1 T cos q) 


+ ef (o1, 02, p,q) [23] 
with w: (uw, w), where wi = gus Q2 = gi 2o 
and 4,070 constants. The three scales are 


wii, fg, wz’. In this case, there are many 


(although by no means all) pairs A1, A» which can 
be connected within a time that can be estimated to 
be of order O(c log €^). 

This is a rapid-diffusion case in an a priori 
unstable system in which condition [92] is not 
satisfied: because the e-dependence of @(A) implies 
that the lower bound c in [92] must depend on & 
(and be exponentially small with an inverse power 
of € as c — 0). 

The unperturbed system in [93] is nonresonant in 
the Ho part for € > 0 outside a set of zero measure 
(i.e. where the vector «9. satisfies a suitable 
Diophantine property) and, furthermore, it is 
a priori unstable: cases met in applications can be 
a priori stable and resonant (and often not aniso- 
chronous) in the Ho part. In such a system, not only 
the speed of diffusion is not understood but 
proposals to prove its existence, if present (as 
expected), have so far not given really satisfactory 
results. 

For more details, the 
to Nekhorossev (1977). 


reader in referred 


The Three-Body Problem 


Mechanics and the three-body problem can be 
almost identified with each other, in the sense that 
the motion of three gravitating masses has long been 
a key astronomical problem and at the same time 
the source of inspiration for many techniques: 
foremost among them the theory of perturbations. 
As an introduction, consider a special case. Let 
three masses ms =o, my =m, My =M interact 
via gravity, that is, with interaction potential 
—hmj;m;x; x; : the simplest problem arises 
when the third body has a neglegible mass compared 
to the two others and the latter are supposed to be 
on a circular orbit; furthermore, the mass 7"; is ems 
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with £ small and the mass my moves in the plane of 
the circular orbit. This will be called the “circular 
restricted three-body problem." 

In a reference system with center S and rotating at 
the angular speed of J around S inertial forces 
(centrifugal and Coriolis) act. Supposing that the 
body / is located on the axis with unit vector i at 
distance R from the origin S, the acceleration of the 
point M is 


ER 
Q + (e i) Mo O 


if F is the force of attraction and @ ^ 0 = woó-^ 
where Wo is a vector with |9o|—«wo and perpen- 
dicular to the orbital plane and Q^ S C o, p1) if 
Q — (p1, p2). Here, taking into account that the origin 
S rotates around the fixed center of mass, w5(Q 一 
eR/(1 + e)i) is the centrifugal force while 2@ A © 
is the Coriolis force. The equations of motion can 
therefore be derived from a Lagrangian 


1 1 
L=50 一 双 十 woo e+ 59a 


o-i [94] 


2 ER 


l+e 


with 


ue R? = kms(1 + €) € go 
(dms dme 


lel le 一 Ri| 


where k is the gravitational constant, R the distance 
between S and J, and finally the last three terms in [94] 
come from the Coriolis force (the first) and from the 
centripetal force (the other two, taking into account that 
the origin $ rotates around the fixed center of mass). 

Setting g — go/(1 +€) = kms, the Hamiltonian of 
the system is 


.seE[IE _2..; 
210 i R i) [95] 


The first part can be expressed immediately in the 
action-angle coordinates for the two-body problem 
(cf. the section “Newtonian potential and Kepler’s 
laws"). Calling such coordinates (Lo, ào, Go, yo) and 
0, the polar angle of M with respect to the major axis 
of the ellipse and Ao the mean anomaly of M on its 
ellipse, the Hamiltonian becomes, taking into account 
that for e — 0 the ellipse axis rotates at speed —wo, 


EN: g/je_ jie, 
ee eeg ed -$4) [96] 
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which is convenient if we study the interior problem, 
i.e., |o| < R. This can be expressed in the action- 
angle coordinates via [41], [42]: 


0o = Ao + fro: 0o + Yo — Ao +70 + fx, 


lel Gà 1 [97] 
R gR1+ecos(ào+fy) 


where (see [42]), f =f (esin à, ecos à) and 


f(x,y) = 2x(1 epe) 


with the ellipsis denoting higher orders in x, y even 
in x. The Hamiltonian takes the form, if w? — gR^, 


p. cl oat de Bs Eo. A, No +7) [98] 
LLS R 

where the only important feature (for our purposes) is 
that F(L,G,a,) is an analytic function of L,G,a, 8 
near a datum with |G| « L (i.e., e» 0) and |o| « R. 
However, the domain of analyticity in G is rather 
small as it is constrained by |G| « L excluding in 
particular the circular orbit case G= +L. 

Note that apparently the KAM theorem fails to be 
applicable to [98] because the matrix of the second 
derivatives of Ho(L,G) has vanishing determinant. 
Nevertheless, the proof of the theorem also goes 
through in this case, with minor changes. This can 
be checked by studying the proof or, following a 
remark by Poincaré, by simply noting that the 
“squared” Hamiltonian 7 * (44, has the form 


2 
TLE (£z -uco +eEF' (Go, Lo, Ao, Ao + yo) [99] 
0 


with F' still analytic. But this time 


3 Ho 4 2 
det Go, Lo) = —6g^L. woh = 0 


if h = —g'Lyj? —2»Go £0 


Therefore, the KAM theorem applies to H and 
the key observation is that the orbits generated by 
the Hamiltonian (H-)* are geometrically the same as 
those generated by the Hamiltonian H+: they are 
only run at a different speed because of the need of a 
time rescaling by the constant factor 27.. 

This shows that, given an unperturbed ellipse of 
parameters (Lo,Go) such that @= (g^ / E. —w), 
Go > 0, with wi /w» Diophantine, then the perturbed 
system admits a motion which is quasiperiodic with 
spectrum proportional to @ and takes place on an orbit 
which wraps around a torus remaining forever close to 
the unperturbed torus (which can be visualized as 
described by a point moving, according to the area law 


on an ellipse rotating at a rate 一 w0) with actions 
(Lo, Go), provided & is small enough. Hence, 


The KAM theorem answers, at least conceptually, the 
classical question: can a solution of the three-body 
problem remain close to an unperturbed one forever? 
That is, is it possible that a solar system is stable 
forever? 


Assuming e, |o|/ R «& 1 and retaining only the lowest 
orders in e and |o|/R«& 1 the Hamiltonian [98] 
simplifies into 


eg Ga 
Ho — 9696) 3 ots (3e082(40 +70) 
9 
—ecosrg 一 了 2 cos(Ao 十 270) 
3 
+ 了 ecos(3X +270) ) [100| 
where 
4 
7 1/2 8 Go 
6-(Go) = —((1 +€) I)wGo -5R gR2 


It is an interesting exercise to estimate, assuming 
as model the Hamiltonian [100] and following the 
proof of the KAM theorem, how small has e to be if 
a planet with the data of Mercury can be stable 
forever on a (slowly precessing) orbit with actions 
close to the present-day values under the influence 
of a mass £ times the solar mass orbiting on a circle, 
at a distance from the Sun equal to that of Jupiter. It 
is possible to follow either the above reduction to 
the ordinary KAM theorem or to apply directly to 
[100] the Lindstedt series expansion, proceeding 
along the lines of the section “Quasiperiodicity and 
KAM stability." The first approach is easy but the 
second is more efficient: in both cases, unless the 
estimates are done in a particularly careful manner, 
the value found for ems is not interesting from the 
viewpoint of astronomy. 

The reader is refered to Arnol'd (1989) for more 
details. 


Rationalization and Regularization of 
Singularities 


Often integrable systems have interesting data which 
lie on the boundary of the integrability domain. For 
instance, the central motion when L=G (circular 
orbits) or the rigid body in a rotation around one of 
the principal axes or the two-body problem when 
G — 0 (collisional data). In such cases, perturbation 


theory cannot be applied as discussed above. 
Typically, the perturbation depends on quantities 
like VL — G and is not analytic at L= G. Never- 
theless, it is sometimes possible to enlarge phase space 
and introduce new coordinates in the vicinity of the 
data which in the initial phase space are singular. 

A notable example is the failure of the analysis of 
the circular restricted three-body problem: it appar- 
ently fails when the orbit that we want to perturb is 
circular. 

It is convenient to 
coordinates L, À and G, ^: 


introduce the canonical 


L — Lo, G-—Lo—G 
0 0 0 101] 

A = Ao + Y0, y = —^y 
so that e— V2GL-4/1— G(2L)! and X = 入 十 7 


and 09 — Ao + fx, where fy is defined in [42] (see 
also [97]). Hence, 


Nr. Do 


e- ass -x) 


[102] 
le | L^ü-e 
R | gR : 十 salt +7+ fi+y) 
and the Hamiltonian [100] takes the form 
Hs = = A — wL + wG 
je eS -GE AF fos 


In the coordinates L,G of [101] the unperturbed 
circular case corresponds to G —0 and [96], once 
expressed in the action-angle variables G, L, ^, A, is 
analytic in a domain whose size is controlled by 
VG. Nevertheless, very often problems of perturba- 
tion theory can be “regularized.” 

This is done by “enlarging the integrability" 
domain by adding to it points (one or more) around 
the singularity (a boundary point of the domain of 
the coordinates) and introducing new coordinates to 
describe simultaneously the data close to the 
singularity and the newly added points: in many 
interesting cases, the equations of motion are no 
longer singular (i.e., become analytic) in the new 
coordinates and are therefore apt to describe the 
motions that reach the singularity in a finite time. 
One can say that the singularity was only apparent. 

Perhaps this is best illustrated precisely in the 
above circular restricted three-body problem, with 
the singularity occurring where G — 0, that is, at a 
circular unperturbed orbit. If we describe the points 
with G small in a new system of coordinates 
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obtained from the one in [101] by letting alone 
L, à and setting 


= V2G cos ^, = V 2G sin [104] 


then p, q vary in a neighborhood of the origin with 
the origin itself excluded. 

Adding the origin of the p-4 plane then in a full 
neighborhood of the origin, the Hamiltonian [96] is 
analytic in L,A,p,q. This is because it is analytic 
(cf. [96], [97]) as a function of L,A and ecos 69 
and of cos(Ao--09). Since 09 — A-- y+ fa, and 
0o + ào =à + fx by [97], the Hamiltonian [96] is 
analytic in L, A,ecos(A +y + fy), cos (À + fa) 
for e small (i.e., for G small) and, by [42], f, is 
analytic in esin (A + y) and ecos (A + y). Hence the 
trigonometric identities 


p sin À + q cos A Apes 

JE 2L 
p cos À — q sin À 1158 

VL 2L 
together with G = (1/2)(p? + 4%) imply that [103] is 
analytic near p=q=0 and L »0,A € [0,27]. The 
Hamiltonian becomes analytic and the new coordi- 


nates are suitable to describe motions crossing the 
origin: for example, by setting 


c#5(1 +g Jr 1/2 


esin(A +7) = 
[105] 


ecos(A +7) = 


2 4L 
[100] becomes 


12 + py BL +g) 
x (3cos 2A — ((—11 cos A + 3cos 3A)p 
— (7 sin A + 3 sin 3A)q)C) [106] 


The KAM theorem does not apply in the form 
discussed above to “Cartesian coordinates," that is, 
when, as in [106], the unperturbed system is not 
assigned in action-angle variables. However, there 
are versions of the theorem (actually its corollaries) 
which do apply and therefore it becomes possible to 
obtain some results even for the perturbations of 
circular motions by the techniques that have been 
illustrated here. 

Likewise, the Hamiltonian of the rigid body with 
a fixed point O and subject to analytic external 
forces becomes singular, if expressed in the action- 
angle coordinates of Deprit, when the body motion 
nears a rotation around a principal axis or, more 
generally, nears a configuration in which any two of 
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the axes 13, z, or zo coincide (i.e., any two among the 
principal axis, the angular momentum axis and the 
inertial z-axis coincide; see the section “Rigid 
body”). Nevertheless, by imitating the procedure 
just described in the simpler cases of the circular 
three-body problem, it is possible to enlarge the 
phase space so that in the new coordinates the 
Hamiltonian is analytic near the singular 
configurations. 

A regularization also arises when considering 
collisional orbits in the unrestricted planar three- 
body problem. In this respect, a very remarkable 
result is the regulatization of collisional orbits in the 
planar three-body problem. After proving that if the 
total angular momentum does not vanish, simulta- 
neous collisions of the three masses cannot occur 
within any finite time interval, the question is 
reduced to the regularization of two-body collisions, 
under the assumption that the total angular momen- 
tum does not vanish. 

The local change of coordinates, which changes the 
relative position coordinates (x, y) of two colliding 
bodies as (x, y) — (€,7), with x + iy = (E + in^, is not 
one to one, hence it has to be regarded as an 
enlargement of the positions space, if points with 
different (£, n) are considered different. However, the 
equations of motion written in the variables £, 7 have 
no singularity at £, 7 = O0 (Levi-Civira). 

Another celebrated regularization is the regular- 
ization of the Schwartzschild metric, i.e., of the 
general relativity version of the two-body problem: 
it is, however, somewhat out of the scope of this 
review (SYNGE, KRUSKAL). 

For more details, the reader is refered to Levi- 
Civita (1956). 


Appendix 1: KAM Resummation Scheme 


The idea to control the *remaining contributions" is to 
reduce the problem to the case in which there are no 
pairs of lines that follow each other in the tree order 
and which have the same current. Mark by a scale 
label “0” the lines, see [74], [83], of a tree whose 
divisors C/@p.v(/) are >1: these are lines which give 
no problems in the estimates. Then mark by a scale 
label “>1” the lines with current v(/) such that 
lo - v(D)| € 27"*! for n= 1 (i.e., the remaining lines). 

The lines labeled 0 are said to be on scale 0, while 
those labeled >1 are said to be on scale >1. A cluster 
of scale 0 will be a maximal collection of lines of 
scale 0 forming a connected subgraph of a tree 0. 

Consider only trees 6) € Qo of the family Oo of 
trees containing no clusters of lines with scale label 
0 which have only one line entering the cluster and 
one exiting it with equal current. 


It is useful to introduce the notion of a line 4 
situated “between” two lines ¢,/ with /' > /: this 
will mean that /, precedes /' but not £. 

All trees 0 in which there are some pairs l’ > | of 
consecutive lines of scale label >1 which have equal 
current and such that all lines between them bear 
scale label 0 are obtained by “inserting” on the lines 
of trees in Qo with label >1 any number of clusters 
of lines and nodes, with lines of scale 0 and with the 
property that the sum of the harmonics of the nodes 
inserted vanishes. 

Consider a line lo € 605 € Oo linking nodes vı < v2 
and labeled >1 and imagine inserting on it a cluster 
y of lines of scale 0 with sum of the node harmonics 
vanishing and out of which emerges one line 
connecting a node voy, in y to v? and into which 
enters one line linking v; to a node vj, € y. The 
insertion of a k-lines, |y| — (k + 1)-nodes, cluster 
changes the tree value by replacing the line factor, 
that will be briefly called “value of the cluster y”, as 


(Vu M(isv(lo)v,) — 1 
6o : V(lo) @ V(lo) 


Vu > Vy, 


j [107] 
Qo : V(lo) 


where M is an £ x £ matrix 


ghi Vy- V 
M,,(7, V(lo)) = pr “ort rVin,s IEA I] - 


2 
vey ley Do - v(I) 


if ( — v'v denotes a line linking v’ and v. Therefore, if 
all possible connected clusters are inserted and the 
resulting values are added up, the result can be taken 
into account by attributing to the original line lọ a 
factor like [107] with M? (v(/5)) $ $^. M(; v(lo)) 
replacing M(»; v(lo)). | 

If several connected clusters y are inserted on the 
same line and their values are summed, the result is 
a modification of the factor associated with the line 
lo into 


Pu | metu y 1 


2 n 2 
Dur; Qo : V(lo) Qo : V(lo) 


» Caper sere 2 [108] 
Wo V(lo) — M9 (v(lo)) 


The series defining M'?! involves, by construction, only 
trees with lines of scale 0, hence with large divisors, so 
that it converges to a matrix of small size of order < 
(actually £?, more precisely) if £ is small enough. 
Convergence can be established by simply remark- 
ing that the series defining M'') is built with lines 
with values >(1/2) of the propagator, so that it 
certainly converges for & small enough (by the 
estimates in the section “Perturbing functions," 
where the propagators were identically 1) and the 


sum is of order £ (actually £ł), hence <1. However, 
such an argument cannot be repeated when dealing 
with lines with smaller propagators (which still have 
to be discussed). Therefore, a method not relying on 
so trivial a remark on the size of the propagators has 
eventually to be used when considering lines of scale 
higher than 1, as it will soon become necessary. 

The advantage of the collection of terms achieved 
with [108] is that we can represent b as a sum of 
values of trees which are simpler because they 
contain no pair of lines of scale >1 with in between 
lines of scale 0 with total sum of the node harmonics 
vanishing. The price is that the divisors are now more 
involved and we even have a problem due to the fact 
that we have not proved that the series in [108] 
converges. In fact, it is a geometric series whose value 
is the RHS of [108] obtained by the sum rule [79] 
unless we can prove that the ratio of the geometric 
series is «1. This is trivial in this case by the previous 
remark: but it is better to note that there is another 
reason for convergence, whose use is not really 
necessary here but will become essential later. 

The property that the ratio of the geometric series 
is «1 can be regarded as due to the consequence of 
the cancellation mentioned in the section “Quasi- 
periodicity and KAM stability" which can be 
shown to imply that the ratio is «1 because 
M (y) =e7(@ - v)^m (v) with C [m (y)| « Do 
for some Do > 0 and for all |z| < so for some e». 
Then for small s the divisor in [108] is essentially 
still what it was before starting the resummation. 

At this point, an induction can be started. Consider 
trees evaluated with the new rule and place a scale 
level “>2” on the lines with C |@o - v(/)| < 2^"*! for 
n — 2: leave the label “0” on the lines already marked 
so and label by “1” the other lines. The lines of scale 
“1” will satisfy 2^" < logo -v(I)| <2 "t+! for n— 1. 
The graphs will now possibly contain lines of scale 0, 
1 or 22 while lines with label “>1” no longer can 
appear, by construction. 

A cluster of scale 1 will be-a maximal collection of 
lines of scales 0, 1 forming a connected subgraph of 
a tree Ó and containing at least one line of scale 1. 

The construction carried out by considering clusters 
of scale 0 can be repeated by considering trees 01 € O1, 
with ©, the collection of trees with lines marked 0, 1, 
or >2 and in which no pairs of lines with equal 
momentum appear to follow each other if between 
them there are only lines marked 0 or 1. 

Insertion of connected clusters y of such lines on a 
line lọ of 0 leads to define a matrix M! formed by 
summing tree values of clusters y with lines of scales 
0 or 1 evaluated with the line factors defined in 
[107] and with the restriction that in y there are no 
pairs of lines / < /' with the same current and which 
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follow each other while any line between them has 
lower scale (i.e., 0), here “between” means “preced- 
ing l but not preceding /,” as above. 

Therefore, a scale-independent method has to be 
devised to check the convergence for M! and for the 
matrices to be introduced later to deal with even 
smaller propagators. This is achieved by the following 
extension of Siegel's theorem mentioned in the section 
“Quasiperiodicity and KAM stability”: 


Theorem 8 Let o satisfy [74] and set @ = Co». 
Consider tbe contribution to tbe sum in |82] from 
grapbs 0 in which 


(i) no pairs l > € of lines which lie on the same 
patb to tbe root carry tbe same current v if all 
lines (4 between them have current v((4) such 
that |o - v(4)| > 2|o - vl; 

(ii) the node harmonics are bounded by |v| € N for 
some N. 


Then tbe number of lines 0 in 0 with divisor & - v; 
satisfying 27" < |æ -v| €x 27"! does not exceed 
4 Nk2-*/", 5—1,2;.... 


This implies, by the same estimates in [85], that 
the series defining M'') converges. Again, it must be 
checked that there are cancellations implying that 
M (y) =e}(@ - v)^m (v) with [m (v)| < Do for 
the same Do > 0 and the same «e». 

At this point, one deals with trees containing only 
lines carrying labels 0, 1, > 2, and the line factors for 
the lines /—v'v of scale 0 are Vy -V,/(@o .v(£))?, 
those of the lines / — vv of scale 1 have line factors 
V» -(Qo-V(£) — MO (v(£)) !v,, and those of the 
lines £= v/v of scale > 2 have line factors 


vy - (05 - v(£)* — MY (v(4))) vy 


Furthermore, no pair of lines of scale *1" or of scale 
“>2” with the same momentum and with only lines 
of lower scale (1.e., of scale “0” in the first case or of 
scale “0”, “1” in the second) between them can 
follow each other. 

This procedure can be iterated until, after infi- 
nitely many steps, the problem is reduced to the 
evaluation of tree values in which each line carries a 
scale label n and there are no pairs of lines which 
follow each other and which have only lines of 
lower scale in between. Then the Siegel argument 
applies once more and the series so resumed is an 
absolutely convergent series of functions analytic in 
e: hence the original series is convergent. 

Although at each step there is a lower bound on the 
denominators, it would not be possible to avoid using 
Siegel's theorem. In fact, the lower bound would become 
worse and worse as the scale increases. In order to check 
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the estimates of the constants Do, so which control the 
scale independence of the convergence of the various 
series, it is necessary to take advantage of the theorem, 
and of the absence (at each step) of the necessity of 
considering trees with pairs of consecutive lines with 
equal momentum and intermediate lines of higher scale. 

One could also perform the analysis by bounding 
b/ order by order with no resummations (i.e., 
without changing the line factors) and exhibiting the 
necessary cancellations. Alternatively, the paths that 
Kolmogorov, Arnol'd and Moser used to prove 
the first three (somewhat different) versions of the 
theorem, by successive approximations of the 
equations for the tori, can be followed. 

The invariant tori are Lagrangian manifolds just 
as the unperturbed ones (cf. comments after [31]) 
and, in the case of the Hamiltonian [80], the 
generating function A-w+®(A,w) can be 
expressed in terms of their parametric equations 


P(A, y) = G(w)-- a- vy - b(w)- (A— e — Ab(y)) 


dyG(w) = — Ab(y) + b(w)dyAb(w) — a 


[109] 


def f, dy 
a‘ | (-Ab(y) + BAAB) SE 


dy 
= Al 
j b(w)dyAh(w) Quy 


where A=(@-dy) and the invariant torus corre- 
sponds to A' =Ø in the map œ = y 4- 949A, y) and 
A —A--9y9(A, v). In fact, by [109] the latter 
becomes A’ =A — Ab and, from the second of [75] 
written for f depending only on the angles œ, it is 
A — Q9 + Ab when A,@ are on the invariant torus. 

Note that if a exists it is necessarily determined by the 
third relation in [109] but the check that the second 
equation in [109] is soluble (i.e., that the RHS is an exact 
gradient up to a constant) is nontrivial. The canonical 
map generated by A - w+ (A, v) is also defined for A’ 
close to @ and foliates the neighborhood of the invariant 
torus with other tori: of course, for A’ 4 @ the tori 
defined in this way are, in general, not invariant. 

The reader is referred to Gallavotti et al. (2004) 
for more details. 


Appendix 2: Coriolis and Lorentz 
Forces - Larmor Precession 


Larmor precession refers to the motion of an 
electrically charged particle in a magnetic field H 
(in an inertial frame of reference). It is due to the 
Lorentz force which, on a unit mass with unit 
charge, produces an acceleration 6=u/H if the 
speed of light is c — 1. 


Therefore, if H — Hk is directed along the k-axis, 
the acceleration it produces is the same that the 
Coriolis force would impress on a unit mass located 
in a reference frame which rotates with angular 
velocity wok around the k-axis if H = 2wok. 

The above remarks imply that a homogeneous 
sphere electrically charged uniformly with a unit 
charge and freely pivoting about its center in a 
constant magnetic field H directed along the k-axis 
undergoes the same motion as it would follow if not 
subject to the magnetic field but seen in a 
noninertial reference frame rotating at constant 
angular velocity wo around the k-axis if H and wo 
are related by H — 2/9: in this frame, the Coriolis 
force is interpreted as a magnetic field. 

This holds, however, only if the centrifugal force 
has zero moment with respect to the center: true in 
the spherical symmetry case only. In spherically 
nonsymmetric cases, the centrifugal forces have in 
general nonzero moment, so the equivalence 
between Coriolis force and the Lorentz force is 
only approximate. 

The Larmor theorem makes this more precise. It 
gives a quantitative estimate of the difference between 
the motion of a general system of particles of mass m 
in a magnetic field and the motion of the same 
particles in a rotating frame of reference but in the 
absence of a magnetic field. The approximation is 
estimated in terms of the size of the Larmor frequency 
eH/2mc, which should be small compared to the 
other characteristic frequencies of the motion of the 
system: the physical meaning is that the centrifugal 
force should be small compared to the other forces. 

The vector potential A for a constant magnetic 
field in the k-direction, H —2wok, is A=2wok ^o = 
2wyo+. Therefore, from the treatment of the Coriolis 
force in the section “Three-body problem" (see 
[95]), the motion of a charge e with mass m in a 
magnetic field H with vector potential A and subject 
to other forces with potential W can be described, in 
an inertial frame and in generic units, in which the 
speed of light is c, by a Hamiltonian 


= (p - 5A) +W(o) 


C 


H [110] 
where p=mọò + (e/c)A and o are canonically con- 
jugate variables. 
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Differential geometry is the study of differential 
properties of geometric objects such as curves, 
surfaces and higher-dimensional manifolds endowed 
with additional structures such as metrics and 
connections. One of the main ideas of differential 
geometry is to apply the tools of analysis to 
investigate geometric problems; in particular, it 
studies their “infinitesimal parts," thereby lineariz- 
ing the problem. However, historically, geometric 
concepts often anticipated the analytic tools 
required to define them from a differential geometric 
point of view; the notion of tangent to a curve, for 
example, arose well before the notion of derivative. 

In its barely more than two centuries of existence, 
differential geometry has always had strong (often 
two-way) interactions with physics. Just to name a 
few examples, the theory of curves is used in 
kinematics, symplectic manifolds arise in Hamilto- 
nian mechanics, pseudo-Riemannian manifolds in 
general relativity, spinors in quantum mechanics, Lie 
groups and principal bundles in gauge theory, and 
infinite-dimensional manifolds in the path-integral 
approach to quantum field theory. 


Curves and Surfaces 


The study of differential properties of curves and 
surfaces resulted from a combination of the coordi- 
nate method (or analytic geometry) developed by 
Descartes and Fermat during the first half of the 
seventeenth century and infinitesimal calculus devel- 
oped by Leibniz and Newton during the second half 
of the seventeenth and beginning of the eighteenth 
century. 
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Differential geometry appeared later in the eight- 
eenth century with the works of Euler Recherches 
sur la courbure des surfaces (1760) (Investigations 
on the curvature of surfaces) and Monge Une 
application de l'analyse à la géométrie (1795) (An 
application of analysis to geometry). Until Gauss' 
fundamental article Disquisitiones generales circa 
superficies curvas (General investigations of curved 
surfaces) published in Latin in 1827 (of which one 
can find a partial translation to English in Spivak 
(1979)), surfaces embedded in R? were either 
described by an equation, W(x,y,z)=0, or by 
expressing one variable in terms of the others. 
Although Euler had already noticed that the 
coordinates of a point on a surface could be 
expressed as functions of two independent variables, 
it was Gauss who first made a systematic use of such 
a parametric representation, thereby initiating the 
concept of *local chart" which underlies differential 
geometry. 


Differentiable Manifolds 


The actual notion of z-manifold independent of a 
particular embedding in a Euclidean space goes back 
to a lecture Über die Hypothesen, welche der 
Geometrie zu Grunde liegen (On the hypotheses 
which lie at the foundations of geometry) (of which 
one can find a translation to English and comments 
in Spivak (1979)) delivered by Riemann at Góttingen 
University in 1854, in which he makes clear the 
fact that z-manifolds are locally like n-dimensional 
Euclidean space. In his work, Riemann mentions 
the existence of infinite-dimensional manifolds, 
such as function spaces, which today play an 
important role since they naturally arise as config- 
uration spaces in quantum field theories. 

In modern language a differentiable manifold 
modeled on a topological space V (which can be 
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finite dimensional, Fréchet, Banach, or Hilbert for 
example) is a topological space M equipped with a 
family of local coordinate charts (U;, ó;);-; such that the 
open subsets U; C M cover M and where ¢;: U; — V, 
i € I, are homeomorphisms which give rise to smooth 
transition maps dio ó;! : ó;(Uj NU) pi( Ui N U;). 
An n-dimensional differentiable manifold is a differ- 
entiable manifold modeled on R”. The sphere 
87-1 :— ((x1,...,x,) € R”, 07, x? 1] isa differenti- 
able manifold of dimension n — 1. 

Simple differentiable curves in R” are one- 
dimensional differentiable manifolds locally speci- 
fied by coordinates x(t)= (x1(t),...,x,(t)) € R”, 
where t x;(t) is of class C*. The tangent at point 
x(t) to such a curve, which is a straight line passing 
through this point with direction given by the vector 
x'(ty), generalizes to the concept of tangent space 
TM at point m € M of a smooth manifold M 
modeled on V which is a vector space isomorphic to 
V spanned by tangent vectors at point m to curves 
y(t) of class C! on M such that (to) =m. 

In order to make this more precise, one needs the 
notion of differentiable mapping. Given two differ- 
entiable manifolds M and N, a mapping f: M 一 和 
is differentiable at point m if, for every chart (U, à) 
of M containing m and every chart (V, v) of N such 
that f(U) C V, the mapping v of o ó ! : (U) = w(V) 
is differentiable at point ó(»:). In particular, differenti- 
able mappings f : M — R form the algebra C*(M, R) 
of smooth real-valued functions on M. Differentiable 
mappings y:|a,b] — M from an interval [a,b] C R to 
a differentiable manifold M are called *differentiable 
curves” on M. A differentiable mapping f: M — N 
which is invertible and with differentiable inverse 
f ^:N — M is called a diffeomorphism. 

The derivative of a function f € C*(M,R) along 
a curve »y:[a, b] —^ M at point y(to) € M with to € 
[a, b] is given by 


d 
Xf fent 


and the map f+ Xf is called the tangent vector to 
the curve y at point Y(t0). Tangent vectors to some 
curve »y:[a, b] —^ M at a given point m € yl[a, b]) 
form a vector space T,, M called the “tangent space” 
to M at point m. 

A (smooth) map which, to a point m € M, assigns 
a tangent vector X € T,,M is called a (smooth) 
vector field. It can also be seen as a derivation 
X:f Xf on C*(M,R) defined by (Xf)(m):— 
X(m)f for any m € M and the bracket of vector 
fields is thereby defined from the operator bracket 
[X, Y]: XoY - Yo X. The linear operations on 
tangent vectors carry out to vector fields (X 十 
Y)(m):— Xim) + Y(m), (AX)(m):=AX(m) for any 


m€HM and for any X,Y €T,M,XER so that 
vector fields on M build a linear space. 

One can generate tangent vectors to M via local 
one-parameter groups of differentiable transforma- 
tions of M, that is, mappings (£,71) — @,(m) from 
|l-e6«[ x U to U (with e>0 and UCM an 
open subset of M) such that ġo =Id, tts = Qt 0 És 
Vs,t € J—e,e[ with t + s € ]-e, e| and m — ó,(m) is a 
diffeomorphism of U onto an open subset ó;(U). 
The tangent vector at 7 — 0 to the curve q(t) = ¢;(m) 
yields a tangent vector to M at point 7»:—^(0). 
Conversely, when M is finite dimensional, the 
fundamental. theorem for systems of ordinary 
equations yields, for any vector field X on M, the 
existence (around any point mEM) of a 
local one-parameter group of local transformations 
ó:]—e,e[ x U — M (with U an open subset contain- 
ing m) which induces the tangent vector 
X(m) € T,,M. 

A differentiable mapping ó: M — N induces a map 
$ó.(m): T4,M —^ T4,,,M defined by ó,Xf — X(f o 4). 
An “immersion” of a manifold M in a manifold N is a 
differentiable mapping 6: M — N such that the maps 
@,(m) are injective at any point m € M. Such a map is 
an embedding if it is moreover injective in which case 
@(M) C N is a submanifold of N. The unit sphere 5" 
is a submanifold of R”*'. Whitney showed that every 
smooth real n-dimensional manifold can be embedded 
in R2z+1l 

A differentiable manifold whose coordinate charts 
take values in a complex vector space V and whose 
transition maps are holomorphic is called a complex 
manifold, which is complex n-dimensional if V = C". 
The complex projective space CP", the union of 
complex straight lines through 0 in Cr is a 
compact complex manifold of dimension n. Similarly 
to the notion of differentiable mapping between 
differentiable manifolds, we have the notion of 
holomorphic mapping between complex manifolds. 

A smooth family m+ Jm of endomorphisms of the 
tangent spaces T, M to a differentiable manifold M such 
that JŽ, = — Id gives rise to an almost-complex manifold. 
The prototype is the almost-complex structure on C" 
defined by J(0,,)=0,,; J(O,,)= —O,, with z=(x1 + 
ly1, -.., Xn + iY) € C" which can be transferred to a 
complex manifold M by means of local charts. An 
almost-complex structure / on a manifold M is called 
complex if M is the underlying differentiable manifold 
of a complex manifold which induces J in this way. 

Studying smooth functions on a differentiable 
manifold can provide information on the topology 
of the manifold: for example, the behavior of a 
smooth function on a compact manifold as its 
critical points strongly restricted by the topological 
properties of the manifold. This leads to the Morse 


critical point theory which extends to infinite- 
dimensional manifolds and, among other conse- 
quences, leads to conclusions on extremals or closed 
extremals of variational problems. Rather than 
privileging points on a manifold, one can study 
instead the geometry of manifolds from the point of 
view of spaces of functions, which leads to an 
algebraic approach to differential geometry. The 
initial concept there is a commutative ring (which 
becomes a possibly noncommutative algebra in the 
framework of noncommutative geometry), namely 
the ring of smooth functions on the manifold, while 
the manifold itself is defined in terms of the ring as the 
space of maximal ideals. In particular, this point of 
view proves to be fruitful to understand supermani- 
folds, a generalization of manifolds which is impor- 
tant for supersymmetric field theories. 

One can further consider the sheaf of smooth 
functions on an open subset of the manifold; this 
point of view leads to sheaf theory which provides a 
unified approach to establishing connections between 
local and global properties of topological spaces. 


Metric Properties 


Riemann focused on the metric properties of manifolds 
but the first clear formulation of the concept of a 
manifold equipped with a metric was given by Weyl in 
Die Idee der Riemannsche Fläche. A Riemannian 
metric on a differentiable manifold M is a positive- 
definite scalar product g,, on T,,M for every point 
m € M depending smoothly on the point m. A manifold 
equipped with a Riemannian metric is called a 
Riemannian manifold. A Weyl transformation, which 
is multiplying the metric by a smooth positive function, 
yields a new Riemannian metric with the same angle 
measurement as the original one, and hence leaves the 
*conformal" structure on M unchanged. 

Riemann also suggested considering metrics on 
the tangent spaces that are not induced from scalar 
products; metrics on the manifold built this way 
were first systematically investigated by Finsler and 
are therefore called Finsler metrics. Geodesics on a 
Riemannian manifold M which correspond to 
smooth curves :[a,b] + M that minimize the 
length functional 


| f* dy dy 
Lid=3 few Grea) 


then generalize to curves which realize the shortest 


distance between two points chosen sufficiently close. 


Euclid’s axioms which naturally lead to Rieman- 
nian geometry are also satisfied up to the axiom 
of parallelism by a geometry developed by 
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Lobatchevsky in 1829 and Bolyai in 1832. Non- 
Euclidean geometries actually played a major role in 
the development of differential geometry and Loba- 
chevsky’s work inspired Riemann and later Klein. 

Dropping the positivity assumption for the 
bilinear forms g,, on T,M leads to Lorentzian 
manifolds which are (m+ 1)-dimensional smooth 
manifolds equipped with bilinear forms on the 
tangent spaces with signature (1,7). These occur in 
general relativity and tangent vectors with negative, 
positive, or vanishing squared length are called 
timelike, spacelike, and lightlike, respectively. 

Just as complex vector spaces can be equipped with 
positive-definite Hermitian products, a complex 
manifold M can come equipped with a Hermitian 
metric, namely a positive-definite Hermitian product 
b, on T,M for every point m € M depending 
smoothly on the point m; every Hermitian metric 
induces a Riemannian one given by its real part. The 
complex projective space CP” comes naturally 
equipped with the Fubini-Study Hermitian metric. 


Transformation Groups 


Metric properties can be seen from the point of view 
of transformation groups. Poncelet in his Traité 
projectif des figures (1822) had investigated classical 
Euclidean geometry from a projective geometric 
point of view, but it was not until Cayley (1858) 
that metric properties were interpreted as those 
stable under any “projective” transformation which 
leaves “cyclic points” (points at infinity on the 
imaginary axis of ,the complex plane) invariant. 
Transformation groups were further investigated by 
Lie, leading to the modern concept of Lie group, a 
smooth manifold endowed with a group structure 
such that the group operations are smooth. 

A vector field X on a Lie group G is called left- 
(resp. right-) invariant if it is invariant under left 
translations L,:ht+gh (resp. right translations 
R,:h++hg) for every g € G, that is, if (Ly), X(b) = 
X(gh) V(g,h) € G2 (resp. (Rg).X(b) = X(gh) V(g,h) 
€ G?). The set of all left-invariant vector fields 
equipped with the sum, scalar multiplication, and 
the bracket operation on vector fields form an 
algebra called the Lie algebra of G. 

The group Gl,(R) (resp. GI,,(C)) of all real (resp. 
complex) invertible x m matrices is a Lie group 
with Lie algebra, the algebra gl,(R) (resp. gl,,(C)) of 
all real (resp. complex) xm matrices and the 
bracket operation reads [A, B] = AB — BA. 

The orthogonal (resp. unitary) group O,(R):— 
{A € GL(R), A'A = 1}, where A‘ denotes the trans- 
posed matrix (resp. U,(C):— (A € Gl,(C), A*A = 1}, 
where- A*=A'), is a compact Lie group with Lie 
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algebra o,(R):— (A € Gl,,(R), A‘ = —A} (resp. un(C):= 
(A € GL(C), A* = —A}). 

A left-invariant vector field X on a finite-dimen- 
sional Lie group G (or equivalently an element X of 
the Lie algebra of G) generates a global one- 
parameter group of transformations óx(t),t €R. 
The mapping from the Lie algebra of G into G 
defined by exp(X) :— $x(1) is called the exponential 
mapping. The exponential mapping on GI, (R) (resp. 
Gl,(C)) is given by the series exp(A) = $7; , A'/il. 

As symmetry groups of physical systems, Lie 
groups play an important role in physics, in 
particular in quantum mechanics and Yang-Mills 
theory. Infinite-dimensional Lie groups arise as 
symmetry groups, such as the group of diffeomorph- 
isms of a manifold in general relativity, the group of 
gauge transformations in Yang-Mills theory, and 
the group of Weyl transformations of metrics on a 
surface in string theory. The principle “the physics 
should not depend on how it is described" translates 
to an invariance under the action of the (possibly 
infinite-dimensional group) of symmetries of the 
theory. Anomalies arise when such an invariance 
holds for the classical action of a physical theory but 
“breaks” at the quantized level. 

In his Erlangen program (1872), Klein puts the 
concept of transformation group in the foreground 
introducing a novel idea by which one should 
consider a space endowed with some properties 
as a set of objects invariant under a given group of 
transformations. One thereby reaches a classifica- 
tion of geometric results according to which group is 
relevent in a particular problem as, for example, the 
projective linear group for projective geometry, 
the orthogonal group for Riemannian geometry, or 
the symplectic group for “symplectic” geometry. 


Fiber Bundles 


Transformation groups give rise to principal fiber 
bundles which play a major role in Yang-Mills 
theory. The notion of fiber bundle first arose out of 
questions posed in the 1930s on the topology and the 
geometry of manifolds, and by 1950 the definition of 
fiber bundle had been clearly formulated by Steenrod. 

A smooth fiber bundle with typical fiber a 
manifold F is a triple (E, 7, B), where E and B are 
smooth manifolds called the total space and the base 
space, and m:E — B is a smooth surjective map 
called the projection of the bundle such that the 
preimage «^! (b) of a point b € B called the fiber of 
the bundle over b is isomorphic to F and any base 
point b has a neighborhood U C B with preimage 
nm !(U) diffeomorphic to U x F, where the diffeo- 
mophisms commute with the projection on the base 


space. Smooth sections of E are maps e: B — E such 
that T00 — Ip. 

When F is a vector space and when, given open 
subsets U; C B that cover B with corresponding 
coordinate charts (U;, @i);e1, the local diffeomorph- 
isms 7;:7 !(U;) ^ $;(U;) x F give rise to transition 
maps 7; oT! : ó;(Uj n U;) x F—Oo;(U;n U;) x F that 
are linear in the fiber, the bundle is called a *vector 
bundle.” The tangent bundle TM = („pem T, M to a 
differentiable manifold M modeled on a vector space 
V is a vector bundle with typical fiber V and 
transition maps 7;—(4;o ;!, d(d; o $;)) expressed 
in terms of the differentials of the transition maps on 
the manifold M. So are the cotangent bundle, the 
dual of the tangent bundle, and tensor products of 
the tangent and cotangent vector bundles with 
typical fiber the dual V* and tensor products of V 
and V*. Vector fields defined previously are sections 
of the tangent bundle, 1-forms on M are sections of 
the cotangent bundle, and contravariant tensors, 
resp. covariant tensors are sections of tensor 
products of the tangent, resp. cotangent bundles. A 
differentiable mapping ó:M — N takes covariant 
p-tensor fields on N to their pullbacks by 4, 
covariant p-tensors on M given by 


(O° T)(X1,..., Xp) = T($.X1,..., 0. Xy) 


for any vector fields X1,..., Xy on M. 

Differentiating a smooth function f on M gives 
rise to a 1-form df on M. More generally, exterior p- 
forms are antisymmetric smooth covariant p-tensors 
so that w(X,(1),..-5Xo(p)) «(0)u(X1,..., Xy) for 
any vector fields X,..., Xy on M and any permuta- 
tion o € X, with signature e(o). 

Riemannian metrics are covariant 2-tensors and 
the space of Riemannian metrics on a manifold M is 
an infinite-dimensional manifold which arises as a 
configuration space in string theory and general 
relativity. 

A principal bundle is a fiber bundle (P, z, B) with 
typical fiber a Lie group G acting freely and properly 
on the total space P via a right action (p,g) € 
Px Grepg=R,(p) <P and such that the local 
diffeomorphisms «^! (U) ^ U x G are G-equivariant. 
Given a principal fiber bundle (P, m, B) with structure 
group a finite-dimensional Lie group G, the action of 
G on P induces a homomorphism which to an 
element X of the Lie algebra of G assigns a vector 
field X* on P called the *fundamental vector field" 
generated by X. It is defined at p € P by 


: d 
X (p) asd dt Rexpcex) (P) 


where exp is the exponential map on G. 


Given an action of G on a vector space V, one 
builds from a principal bundle with typical fiber G an 
associated vector bundle with typical fiber V. 
Principal bundles are essential in gauge theory; U(1)- 
principal bundles arise in electro-magnetism and 
nonabelian structure groups arise in Yang-Mills 
theory. There the fields are connections on the 
principal bundle, and the action of gauge transforma- 
tions on (irreducible) connections gives rise to an 
infinite-dimensional principal bundle over the moduli 
space with structure group given by gauge transfor- 
mations. Infinite-dimensional bundles arise in other 
field theories such as string theory where the moduli 
space corresponds to inequivalent complex structures 
on a Riemann surface and the infinite-dimensional 
structure group is built up from Weyl transformations 
of the metric and diffeomorphisms of the surface. 


Connections 


On a manifold there is no canonical method to 
identify tangent spaces at different points. Such an 
identification, which is needed in order to differenti- 
ate vector fields, can be achieved on a Riemannian 
manifold via *parallel transport" of the vector fields. 
The basic concepts of the theory of covariant 
differentiation on a Riemannian manifold were given 
at the end of the nineteenth century by Ricci and, in a 
more complete form, in 1901 in collaboration with 
Levi-Civita in Méthodes de calcul différentiel absolu et 
leurs applications; on a Riemannian manifold, it is 
possible to define in a canonical manner a parallel 
displacement of tangent vectors and thereby to 
differentiate vector field covariantly using the since 
then called Levi-Civita connection. 

More generally, a (linear) connection (or equiva- 
lently a covariant derivation) on a vector bundle E 
over a manifold M provides a way to identify fibers 
of the vector bundle at different points; it is a map V 
taking sections o of E to E-valued 1-forms on M 
which satisfies a Leibniz rule, V(fo) — dfc + f Vo, 
for any smooth function f on M. When E is the 
tangent bundle over M, curves y on the manifold 
with covariantly constant velocity V(t) = 0 give rise 
to geodesics. Given an initial velocity 4(0)— X € 
T,,M and provided X has small enough norm, »x(1) 
defines a point on the corresponding geodesic and 
the map exp: X yx(1) a diffeomorphism from a 
neighborhood of 0 in T,,M to a neighborhood of 
m € M called the “exponential map" of V. 

The concept of connection extends to principal 
bundles where it was developed by Ehresmann 
building on the work of Cartan. A connection on a 
principal bundle (P,7,B) with structure group G, 
which is a smooth equivariant (under the action of 
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the group G) decomposition of the tangent space 
T,P = HP & V,P at each point p into a horizontal 
space H,P and the vertical space VpP = Ker drp, 
gives rise to a linear connection on the associated 
vector bundle. 

A connection on P gives rise to a 1-form w on P 
with values in the Lie algebra of the structure group 
G called the connection 1-form and defined as 
follows. For each X € T,P,o(X) is the unique 
element U of the Lie algebra of G such that the 
corresponding fundamental vector field U*(p) at 
point p coincides with the vertical component of X. 
In particular, w(U*) = U for any element U of the Lie 
algebra of G. 

The space of connections which is an infinite- 
dimensional manifold arises as a configuration space 
in Yang-Mills theory and also comes into play in the 
Seiberg-Witten theory. 


Geometric Differential Operators 


From connections one defines a number of differ- 
ential operators on a Riemannian manifold, among 
them second-order Laplacians. In particular, the 
Laplace-Beltrami operator fre —tr(V7*M df) on 
smooth functions, where V!'M is the connection on 
the cotangent bundle induced by the Levi-Civita 
connection on M, generalizes the ordinary Laplace 
operator on Euclidean space. This in turn generalizes 
to second-order operators AF:— 一 tr(VI MSEyE) 
acting on smooth sections of a vector bundle E over 
a Riemannian manifold M, where V^ is a connection 
on E and VTMSE the connection on T'M QE 
induced by V^ and the Levi-Civita connection on M. 

The Dirac operator on a spin Riemannian 
manifold, a first-order differential operator whose 
square coincides with the Laplace-Beltrami opera- 
tor up to zeroth-order terms, can be best under- 
stood going back to the initial idea of Dirac. A 
first-order differential operator with constant 
matrix coefficients > 77 ,(0/Ox; has square 
given by the Laplace operator —5 77 ,0?/0x? on 
R" if and only if its coefficients satisfy the the 
Clifford relations 


qub oMXici.lgn 
wocww-9 vVizj 


The resulting Clifford algebra, once complexified, is 
isomorphic in even dimensions n= 2k to the space 
End(S,,) (and End(S,,) & End($,) in odd dimensions 
n — 2k + 1) of endomorphisms of the space S,, = 

of complex n-spinors. When instead of the canoni- 
cal metric on R” one starts from the the metric on 
the tangent bundle TM induced by the Riemannian 
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metric on M and provided the corresponding spinor 
spaces patch up to a “spinor bundle" over M, M is 
called a spin manifold. The Dirac operator on a 
spin Riemannian manifold M is a first-order 
differential operator acting on spinors given by 
D,— 5 (1"*Vea, Where V is the connection 
on spinors (sections of the spinor bundle $) induced 
by the Levi-Civita connection and e;,...,e, is 
an orthonormal frame of the tangent bundle TM. 
This is a particular case of more general twisted 
Dirac operators D? on a twisted spinor bundle 
S& W equipped with the connection V?*V which 
combines the connection V with a connection VV 
on an auxilliary vector bundle W. Their square 
(DY ) relates to the Laplacian AS®™ built from this 
twisted connection via the Lichnerowicz formula 
which is useful for estimates on the spectrum of the 
Dirac operator in terms of the underling geometric 
data. 

When there is no spin structure on M, one can still 
hope for a Spin* structure and a Dirac D* operator 
associated with a connection compatible with that 
structure. In particular, every compact orientable 
4-manifold can be equipped with a Spin‘ structure 
and one can build invariants of the differentiable 
manifold called Seiberg-Witten invariants from 
solutions of a system of two partial differential 
equations, one of which is the Dirac equation 
D*4 —0 associated with a connection compatible 
with the Spin* structure and the other a nonlinear 
equation involving the curvature. 


Curvature 


, 


The concept of "curvature," which is now under- 
stood in terms of connections (the curvature of a 
connection V is defined by 2=V7), historically 
arose prior to that of connection. In its modern 
form, the concept of curvature dates back to Gauss. 
Using a spherical representation of surfaces — the 
Gauss map v, which sends a point m of an oriented 
surface X C R? to the outward pointing unit normal 
vector Vm — Gauss defined what is since then called 
the Gaussian curvature K,, at point m € U C X as 
the limit when the area of U tends to zero of the 
ratio area(v(U))/area(U). It measures the obstruc- 
tion to finding a distance-preserving map from a 
piece of the surface around m to a region in the 
standard plane. Gauss’ Teorema Egregium says that 
the Gaussian curvature of a smooth surface in R? is 
defined in terms of the metric on the surface so that 
it agrees for two isometric surfaces. 

From the curvature Q of a connection on a 
Riemannian manifold (M,g), one builds the 


Riemannian curvature tensor, a 4-tensor which in 
local coordinates reads 


O 0\0 0 


further taking a partial trace leads to the Ricci 
curvature given by the 2-tensor Ric;= 5^, Ri, 
the trace of which gives in turn the scalar cur- 
vature R= Ric. Sectional curvature at a point 
m in the direction of a two-dimensional plane 
spanned by two vectors U and V corresponds to 
K(U, V) = g(Q(U, V)V, U). A manifold has constant 
sectional curvature whenever K(U, V)/||U A vi? is a 
constant K for all linearly independent vectors U,V. 
A Riemannian manifold with constant sectional 
curvature is said to be spherical, flat, or hyperbolic 
type depending on whether K > 0, K ^0, or K « 0, 
respectively. One owes to Cartan the discovery of an 
important class of Riemannian manifolds, symmetric 
spaces, which contains the spheres, the Euclidean 
spaces, the hyperbolic spaces, and compact Lie 
groups. A connected Riemannian manifold M 
equipped at every point m with an isometry Om 
such that o,,(m)=m and the tangent map T,o,, 
equals -Id on the tangent space (it therefore reverses 
the geodesics through m) is called symmetric. CP” 
equipped with the Fubini-Study metric is a symmetric 
space with the isometry given by the reflection with 
respect to a line in C”*'. A compact symmetric space 
has non-negative sectional curvature K. 

Constraints on the curvature can have topological 
consequences. Spheres are the only simply connected 
manifolds with constant positive sectional curvature; 
if a simply connected complete Riemannian mani- 
fold of dimension >1 has non-positive sectional 
curvature along every plane, then it is homeo- 
morphic to the Euclidean space. 

A manifold with Ricci curvature tensor propor- 
tional to the metric tensor is called an Einstein 
manifold. Since Einstein, curvature is a cornerstone 
of general relativity with gravitational force being 
interpreted in terms of curvature. For example, the 
vacuum Einstein equation reads Ric, = (1/2)R; g with 
Ric, the Ricci curvature of a metric g and Rọ its scalar 
curvature. In addition, Kaluza-Klein supergravity is a 
unified theory modeled on a direct product of the 
Mikowski four-dimensional space and an Einstein 
manifold with positive scalar curvature. 

The Ricci flow dg(t)/dt= —2Ricgi), which is 
related with the Einstein equation in general 
relativity, was only fairly recently introduced in the 
mathematical literature. Hopes are strong to get a 
classification of closed 3-manifolds using the Ricci 
flow as an essential ingredient. 


Cohomology 


Differentiation of functions f — df on a differenti- 
able manifold M generalizes to exterior differentia- 
tion a+ da of differential forms. A form a is closed 
whenever it is in the kernel of d and it is exact 
whenever it lies in the range of d. Since d? — 0, exact 
forms are closed. 

Cartan’s structure equations dw = —(1/2)|w,w] + 2 
relate the exterior differential of the connection 1-form 
w on a principal bundle to its curvature 2 given by 
the exterior covariant derivative Dw:= dw o b, where 
b:T,P — H,P is the projection onto the horizontal 
space. 

On a complex manifold, forms split into sums 
of (p,q)-forms, those with p-holomorphic and 
q-antiholomorphic components, and exterior differ- 
entiation splits as d —Ó -- ð into holomorphic and 
antiholomorphic derivatives, with 9? = 9? — 0. 

Geometric data are often expressed in terms of 
closedness conditions on certain differential forms. 
For example, a *symplectic manifold" is a manifold 
M equipped with a closed nondegenerate differential 
2-form called the “symplectic form." The theory of 
J-holomorphic curves on a manifold equipped with 
an almost-complex structure / has proved fruitful in 
building invariants on symplectic manifolds. A 
Kahler manifold is a complex manifold equipped 
with a Hermitian metric h whose imaginary part 
Im h yields a closed (1,1)-form. The complex 
projective space CP” is Kahler. 

The exterior differentation d gives rise to de Rham 
cohomology as Kerd/Imd, and de Rham’s theorem 
establishes an isomorphism between de Rham coho- 
mology and the real singular cohomology of a 
manifold. Chern (or characteristic) classes are topo- 
logical invariants associated to fiber bundles and play 
a crucial role in index theory. Chern-Weil theory 
builds representatives of these de Rham cohomology 
classes from a connection V of the form tr(f(V?)), 
where f is some analytic function. 

When the manifold is Riemannian, the Laplace- 
Beltrami operator on functions generalizes to differ- 
ential forms in two different ways, namely to the 
Bochner Laplacian A^''M on forms (i.e., sections of 
AT*M), where the contangent bundle T*M is 
equipped with a connection induced by the Levi-Civita 
connection and to the Laplace-Beltrami operator on 
forms (d + d* ^ d*d + d d*, where d* is the (formal) 
adjoint of the exterior differential d. These are related 
via Weitzenbóck's formula which in the particular case 
of 1-forms states that the difference of those two 
operators is measured by the Ricci curvature. 

When the manifold is compact, Hodge's theorem 
asserts that the de Rham cohomology groups are 
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isomorphic to the space of harmonic (i.e., annihi- 
lated by the Laplace-Beltrami operator) differential 
forms. Thus, the dimension of the set of harmonic 
k-forms equals the kth Betti numbers from which 
one can define the Euler characteristic x(M) of the 
manifold M taking their alternate sum. Hodge 
theory plays an important role in mirror symmetry 
which posits a duality between different manifolds 
on the geometric side and between different field 
theories via their correlation functions on the 
physics side. Calabi-Yau manifolds, which are 
Ricci-flat Kahler manifolds, are studied extensively 
in the context of duality. 


Index Theory 


While the Gaussian curvature is the solution to a 
local problem, it has strong influence on the global 
topology of a surface. The Gauss-Bonnet formula 
(1850) relates the Euler characteristic on a closed 
surface to the Gaussian curvature by 


1 


where dA, is the volume element on M. This is the 
first result relating curvature to global properties 
and can be seen as one of the starting points for 
index theory. It generalizes to the Chern-Gauss- 
Bonnet theorem (1944) on an even-dimensional 
closed manifold and can be interpreted as an 
example of the Atiyah-Singer index theorem (1963) 


ind) = | A(M,) e?) 
M 


where g denotes a Riemannian metric on a spin 
manifold M, DY a Dirac operator acting on sections 
of some twisted bundle S &€ W with S the spinor 
bundle on M and W an auxiliary vector bundle over 
M, ind(D” ) the “index” of the Dirac operator, and 
Ng, QY respectively the curvatures of the Levi-Civita 
connection and a connection on W, and A(Qg) a 
particular Chern form called the A-genus. Index 
theorems are useful to compute anomalies in gauge 
theories arising from functional quantisation of 
classical actions. 

Given an even-dimensional closed spin manifold 
(M, g) and a Hermitian vector bundle W over M, the 
index of the associated Dirac operator D yields the 
so-called Atiyah map K?(M)— Z defined by 
W — ind(D? ), where K°(M) is the group of formal 
differences of stable homotopy classes of smooth 
vector bundles over M. This is the starting point for 
the noncommutative geometry approach to index 
theory, in which the space of smooth functions on a 
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manifold which arises here in a disguised from since 
K9(M) ~ Ko(C*(M)) (which consists of formal 
differences of smooth homotopy classes of idempo- 
tents in the inductive limit of spaces of matrices 
gl (C™(M))) is generalized to any noncommutative 
smooth algebra. 
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Introduction 


The modern theory of electromagnetism is built on 
the foundations of Maxwell's equations: 


div E = Ê [1] 
€0 
div B — 0 [2] 
1 OE 
curl B — >> = Lod - [3] 
curl E + 二 —0 [4] 


On the left-hand side are the electric and magnetic 
fields, E and B, which are vector-valued functions 
of position and time. On the right are the sources, 
the charge density p, which is a scalar function of 
position and time, and the current density J. The 
source terms encode the distribution and velocities 
of charges, and the equations, together with 
boundary conditions at infinity, determine the fields 


that they generate. From these equations, one can 
derive the familiar predictions of electrostatics and 
magnetostatics, as well as the dynamical behavior 
of fields and charges, in particular, the generation 
and propagation of electromagnetic waves — light 
waves. 

Maxwell would not have recognized the equations 
in this compact vector notation — still less in the 
tensorial form that they take in special relativity. It 
is notable that although his contribution is univer- 
sally acknowledged in the naming of the equations, 
it is rare to see references to “Maxwell’s theory.” 
This is for a good reason. In his early studies of 
electromagnetism, Maxwell worked with elaborate 
mechanical models, which he saw as analogies 
rather than as literal descriptions of the underlying 
physical reality. In his later work, the mechanical 
models, in particular the mechanical properties of 
the “lumiferous ether” through which light waves 
propagate, were put forward more literally as 
the foundations of his electromagnetic theory. The 
equations survive in the modern theory, but the 
mechanical models with which Maxwell, Faraday, 
and others wrestled live on only in the survival of 
archaic terminology, such as “lines of force” and 
“magnetic flux.” The luminiferous ether evaporated 
with the advent of special relativity. 

Maxwell’s legacy is not his “theory,” but his 
equations: a consistent system of partial differential 
equations that describe the whole range of known 
interactions of electric and magnetic fields with 


moving charges. They unify the treatment of 
electricity and magnetism by revealing for the first 
time the full duality between the electric and 
magnetic fields. They have been verified over an 
almost unimaginable variety of physical processes, 
from the propagation of light over cosmological 
distances, through the behavior of the magnetic 
fields of stars and the everyday applications in 
electrical engineering and laboratory experiments, 
down - in their quantum version — to the exchange 
of photons between individual electrons. 

The history of Maxwell's equations is convoluted, 
with many false turns. Maxwell himself wrote down 
an inconsistent form of the equations, with a 
different sign for p in the first equation, in his 
1865 work “A dynamical theory of the electromag- 
netic field." The consistent form appeared later in 
his Treatise on Electricity and Magnetism (1873); 
see Chalmers (1975). 

In this article, we shall not follow the historical 
route to the equations. Some of the complex story of 
the development hinted at in the remarks above can 
be found in the articles by Chalmers (1975), Siegel 
(1985), and Roche (1998). Neither shall we follow 
the traditional pedagogic route of many textbooks in 
building up to the full dynamical equations through 
the study of basic electrical and magnetic phenom- 
ena. Instead, we shall follow a path to Maxwell's 
equations that is informed by knowledge of their 
most critical feature, invariance under Lorentz 
transformations. Maxwell, of course, knew nothing 
of this. 

We shall start with a summary of basic facts 
about the behavior of charges in electric and 
magnetic fields, and then establish the full dynami- 
cal framework by considering this behavior as seen 
from moving frames of reference. It is impossible, of 
course, to do this consistently within the framework 
of classical ideas of space and time since Maxwell's 
equations are inconsistent with Galilean relativity. 
But it is at least possible to understand some of the 
key features of the equations, in particular the need 
for the term involving the time derivative of E, the 
so-called “displacement current," in the third of 
Maxwell's equations. 

We shall begin with some remarks concerning the 
role of relativity in classical dynamics. 


Relativity in Newtonian Dynamics 


Newton's laws hold in all inertial frames. The 
formalism of classical mechanics is invariant under 
Galilean transformations and it is impossible to tell 
by observing the dynamical behavior of particles 
and other bodies whether a frame of reference is at 
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rest or in uniform motion. In the world of classical 
mechanics, therefore: 


Principle of Relativity There is no absolute stan- 
dard of rest; only relative motion is observable. 


In his *Dialogue concerning the two chief world 
systems," Galileo illustrated the principle by arguing 
that the uniform motion of a ship on a calm sea does 
not affect the behavior of fish, butterflies, and other 
moving objects, as observed in a cabin below deck. 
~ Relativity theory takes the principle as funda- 
mental, as a statement about the nature of space and 
time as much as about the properties of the 
Newtonian equations of motion. But if it is to be 
given such universal significance, then it must apply 
to all of physics, and not just to Newtonian 
dynamics. At first this seems unproblematic — it is 
hard to imagine that it holds at such a basic level, 
but not for more complex physical interactions. 
Nonetheless, deep problems emerge when we try to 
extend it to electromagnetism since Galilean invari- 
ance conflicts with Maxwell's equations. 

All appears straightforward for systems involving 
slow-moving charges and slowly varying electric and 
magnetic fields. These are governed by laws that 
appear to be invariant under transformations 
between uniformly moving frames of reference. 
One can imagine a modern version of Galileo's 
ship also carrying some magnets, batteries, semi- 
conductors, and other electrical components. Salvia- 
tis argument for relativity would seem just as 
compelling. 

The problem arises when we include rapidly 
varying fields — in particular, when we consider the 
propagation of light. As Einstein (1905) put it, 
“Maxwell’s electrodynamics..., when applied to 
moving bodies, leads to asymmetries which do not 
appear to be inherent in the phenomena." The 
central difficulty is that Maxwell's equations give 
light, along with other electromagnetic waves, a 
definite velocity: in empty space, it travels with the 
same speed in every direction, independently of the 
motion of the source — a fact that is incompatible 
with Galilean invariance. Light traveling with speed 
c in one frame should have speed c+ u in a frame 
moving towards the source of the light with speed z. 
Thus, it should be possible for light to travel with 
any speed. Light that travels with speed c in a frame 
in which its source is at rest should have some other 
speed in a moving frame; so Galilean invariance 
would imply dependence of the velocity of light on 
the motion of the source. 

A full resolution of the conflict can only be 
achieved within the special theory of relativity: here, 
remarkably, Maxwell's equations retain exactly 
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their classical form, but the transformations between 
the space and time coordinates of frames of 
reference in relative motion do not. The difference 
appears when the velocities involved are not insig- 
nificant when compared with the velocity of light. 
So long as one can ignore terms of order u*/c’, 
Maxwell’s equations are compatible with the Gali- 
lean principle of relativity. 


Charges, Fields, and the 
Lorentz-Force Law 


The basic objects im the modern form of electro- 
magnetic theory are 


e charged particles; and 

e the electric and magnetic fields E and B, which 
are vector quantities that depend on position and 
time. 


The charge e of a particle, which can be positive 
or negative, is an intrinsic quantity analogous 
to gravitational mass. It determines the strength 
of the particle's interaction with the electric 
and magnetic fields — as its mass determines 
the strength of its interaction with gravitational 
fields. 

The interaction is in two directions. First, electric 
and magnetic fields exert a force on a charged 
particle which depends on the value of the charge, 
the particle's velocity, and the values of E and B at 
the location of the particle. The force is given by the 
Lorentz-force law 


f — e(E-- u ^ B) [5] 


in which e is the charge and u is the velocity. It is 
analogous to the gravitational force 


f =mg [6| 


on a particle of mass m in a gravitational field g. It is 
through the force law that an observer can, in 
principle, measure the electric and magnetic fields at 
a point, by measuring the force on a standard charge 
moving with known velocity. - 

Second, moving charges generate electric and 
magnetic fields. We shall not yet consider in detail 
the way in which they do this, beyond stating the 
following basic principles. 


EMI. The fields depend linearly on the charges. 


This means that if we superimpose two distributions 
of charge, then the resultant E and B fields are the 
sums of the respective fields that the two distribu- 
tions generate separately. 


EM2. A stationary point charge e generates an electric 
field, but no magnetic field. The electric field is 
given by 


gam 
x 


[7] 
where r is the position vector from the charge, 
r— |r|, and k is a positive constant, analogous 
to the gravitational constant. 


By combining [7] and [5], we obtain an inverse- 
square law electrostatic force 


kee’ 
P [8] 


between two stationary charges; unlike gravity, it is 
repulsive when the charges have the same sign. 


EM3. A point charge moving with velocity v gen- 
erates a magnetic field 
k'ev ^r 


B=— 9 


where k’ is a second positive constant. 


This is extrapolated from measurements of the 
magnetic field generated by currents flowing in 
electrical circuits. 

The constants k and k’ in EM2 and EM3 
determine the strengths of electric and magnetic 
interactions. They are usually denoted by 


1 , _ Ho 


Tmo “二 条 Ho 


Charge e is measured in coulombs, |B| in teslas, and 
|E| in volts per meter. With other quantities in SI units, 


€ —8.x10'79, po=1.3x 10° [11] 


The charge of an electron is —1.6 x 101? C; the 
current through an electric fire is a flow 
of 5-10 Cs !. The earth's magnetic field is about 
4 x 10? T; a bar magnet's is about 1T; there is a 
field of about 50T on the second floor of the 
Clarendon Laboratory in Oxford; and the magnetic 
field on the surface of a neutron star is about 10? T. 

Although we are more aware of gravity in every- 
day life, it is very much weaker than the electrostatic 
force — the electrostatic repulsion between two 
protons is a factor of 1.2 x 10°° greater than their 
gravitational attraction (at any separation, both 
forces obey the inverse-square law). 

Our aim is to pass from EMI-EM3 to Maxwell's 
equations, by replacing [7] and [9] by partial 
differential equations that relate the field strengths 
to the charge and current densities p and J of a 


continuous distribution of charge. The densities are 
defined as the limits 


p= ia) Tem) a 


where V is a small volume containing the point, e is 
a charge within the volume, and v is its velocity; the 
sums are over the charges in V and the limits are 
taken as the volume is shrunk (although we shall not 
worry too much about the precise details of the 
limiting process). 


Stationary Distributions of Charge 


We begin the task of converting the basic principles 
into partial differential equations by looking at the 
electric field of a stationary distribution of charge, 
where the passage to the continuous limit is made by 
using the Gauss theorem to restate the inverse- 
square law. 

The Gauss theorem relates the integral of the 
electric field over a closed surface to the total charge 
contained within it. For a point charge, the electric 
field is given by EM2: 


er 
47€9r? 


Since divr —3 and grad r— r/r, we have 


i i er e 35 Orr 
div(E) — div( 5.) gr (5- 3 ) z0 


everywhere except at r=0. Therefore, by the 
divergence theorem, 


| E.d$—0 [13 
JOV 


for any closed surface OV bounding a volume V that 
does not contain the charge. 

What if the volume does contain the charge? 
Consider the region bounded by the sphere SR of 
radius R centered on the charge; SR has outward 
unit normal r/r. Therefore, 


e ; e 
H.d$e. T J dist 
[ 4n R?eo Js, €0 


In particular, the value of the surface integral on the 
left-hand side does not depend on R. 

Now consider arbitrary finite volume bounded by 
a closed surface S. If the charge is not inside 
the volume, then the integral of E over S vanishes 
by [13]. If it is, then we can apply [13] to the 
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volume V between S and a small sphere SR to 
deduce that 


feas- | Eds = | E:d$ —0 
S Sr av 


and that the integrals of E over S and Sp are the 
same. Therefore, 
if the charge is in 


e/ €o 
上 E- dS = | the volume bounded by S 
; aM 


0 otherwise 


When we sum over a distribution of charges, 
the integral on the left picks out the total charge 
within S. Therefore, we have the Gauss theorem. 


The Gauss theorem. For any closed surface OV 
bounding a volume V, 


f 8:457 O/a 


where E is the total electric field and QO is the total 
charge within V. 


Now we can pass to the continuous limit. Suppose 
that E is generated by a distribution of charges with 
density p (charge per unit volume). Then by the 
Gauss theorem, 


| Ed4s== f pav 
OV €0 JV 


for any volume V. But then, by the divergence 
theorem, 


nc — p/«g) dV =0 
V 


Since this holds for any volume V, it follows that 
div E = p/«o [14] 


By an argument in a similar spirit, we can also 
show that the electric field of a stationary distribu- 
tion of charge is conservative in the sense that the 
total work done by the field when a charge is moved 
around a closed loop vanishes; that is, 


fE-ds=0 


for any closed path. This is equivalent to 
curl E — 0 [15] 


since, by Stokes' theorem, 


f E-ds- [ curlE-ds 
S 


where $ is any surface spanning the path. This vanishes 
for every path and for every S if and only if [15] holds. 
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The field of a single stationary charge is con- 
servative since 
E d¢, ¢=— 
= —gra = 一 一 一 
5 i 4Teor 
and therefore curl E — 0 since the curl of a gradient 
vanishes identically. For a continuous distribution, 
E = —grad $, where 


uu p(r’) j 
dr) p Í S o Lu i16] 


In the integral, r (the position of the point at which 
@ is evaluated) is fixed, and the integration is over 
the positions r’ of the individual charges. In spite of 
the singularity at r — 7", the integral is well defined. 
So, [15] also holds for a continuous distribution of 
stationary charge. 


The Divergence of the Magnetic Field 


We can apply the same argument that established 
the Gauss theorem to the magnetic field of a slow- 
moving charge. Here, 


| Ho€v Ar 
BEEF 


where r is the vector from the charge to the point at 
which the field is measured. Since r/r? =—grad(1/r), 
we have 


: r 1 
div (v ^ 5) = v ^ curl (rad z) =O 


Therefore, div B=0 except at r=0, as in the case of 
the electric field. However, in the magnetic case, the 
integral of the field over a surface surrounding the 
charge also vanishes, since if SR is a sphere of radius 
R centered on the charge, then 


f Bas 
Sn 


By the divergence theorem, the same is true for any 
surface surrounding the charge. We deduce that if 
magnetic fields are generated only by moving 


charges, then 
J B-dS=0 
av 


for any volume V, and hence that 


div B — 0 [17] 


e vAr f 
= | ——-d$-—0 
4m Je, $7" f 


Of course, if there were free “magnetic poles" 
generating magnetic fields in the same way that 
charges generate electric fields, then this would not 
hold; there would be a *magnetic pole density" on 


the right-hand side, by analogy with the charge 
density in [14]. 


Inconsistency with Galilean Relativity 


Our central concern is the compatibility of the laws 
of electromagnetism with the principle of relativity. 
As Einstein observed, simple electromagnetic inter- 
actions do indeed depend only on relative motion; 
the current induced in a conductor moving through 
the field of a magnet is the same as that generated in 
a stationary conductor when a magnet is moved past 
it with the same relative velocity (Einstein 1905). 
Unfortunately, this symmetry is not reflected in our 
basic principles. We very quickly come up against 
contradictions if we assume that they hold in every 
inertial frame of reference. 

One emerges as follows. An observer O can measure 
the values of B and E at a point by measuring the force 
on a particle of standard charge, which is related to the 
velocity v of the charge by the Lorentz-force law, 


f — e(E -- v ^ B) 


A second observer O' moving relative to the first with 
velocity v will see the same force, but now acting on a 
particle at rest. He will therefore measure the electric 
field to be E'—f/e. We conclude that an observer 
moving with velocity v through a magnetic field B and 
an electric field E should see an electric field 


E =E+vAB [18] 


By interchanging the roles of the two observers, we 
should also have 


E-E' -vA^B' [19] 


where B' is the magnetic field measured by the 
second observer. If both are to hold, then B — B’ 
must be a scalar multiple of v. 

But this is incompatible with EM3; if the fields are 
those of a point charge at rest relative to the first 
observer, then E is given by [7], and 


B=0 


On the other hand, the second observer sees the field 
of a point charge moving with velocity —v. Therefore, 


So B — B' is orthogonal to v, not parallel to it. 

This conspicuous paradox is resolved, in part, by 
the realization that EM3 is not exact; it holds only 
when the velocities are small enough for the 
magnetic force between two particles to be negli- 
gible in comparison with the electrostatic force. If v 
is a typical velocity, then the condition is that v^, 


should be much less than 1/e9. That is, the velocities 
involved should be much less than 


1 
VE0H0 


This, of course, is the velocity of light. 


—3x10*ms^! 


C= 


The Limits of Galilean Invariance 


Our basic principles EM1—EM3 must now be seen to 
be approximations — they describe the interactions of 
particles and fields when the particles are moving 
relative to each other at speeds much less than that of 
light. To emphasize that we cannot expect, in 
particular, EM3 to hold for particles moving at 
speeds comparable with c, we must replace it by 


EM3’. A charge moving with velocity v, where v < c, 


generates a magnetic field 


Hoev Ar 


m 4nr? 


+ O(v*/c’) [20] 
The magnetic field of a system of charges in 
general motion satisfies 


div B = 0 [21] 


In the second part, we have retained [21] as a 
differential form of the statement that there are no 
free magnetic poles; the magnetic field is generated 
only by the motion of the charges. With this change, 
the theory is consistent with the principle of 
relativity, provided that we ignore terms of order 
v^ /c^. The substitution of EM3' for EM3 resolves the 
conspicuous paradox; the symmetry noted by Ein- 
stein between the current generated by the motion of 
the conductor in a magnetic field and by the motion 
of a magnet past a conductor is explained, provided 
that the velocities are much less than that of light. 

The central problem remains however; the equa- 
tions of electromagnetism are not invariant under 
a Galilean transformation with velocity comparable 
to c. The paradox is still there, but it is more subtle 
than it appeared to be at first. There are three 
possible ways out: (1) the noninvariance is real and 
has observable effects (necessarily of order v*/c* or 
smaller); (2) Maxwell's theory is wrong; or (3) the 
Galilean transformation is wrong. Disconcertingly, 
it is the last path that physics has taken. But that is 
to jump ahead in the story. Our task is to complete 
the derivation of Maxwell's equations. 


Faraday's Law of Induction 


The magnetic field of a slow-moving charge will 
always be small in relation to its electric field (even 
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when we replace B by cB to put it into the same 
units as E). The magnetic fields generated by 
currents in electrical circuits are not, however, 
dominated by large electric fields. This is because 
the currents are created by the flow, at slow 
velocity, of electrons, while overall the matter in 
the wire is roughly electrically neutral, with the 
electric fields of the positively charged nuclei and 
negatively charged electrons canceling. 

This is the physical context to keep in mind in 
the following deduction of Faraday's law of 
induction from Galilean invariance for velocities 
much less than c. The law relates the electromotive 
force or “voltage” around an electrical circuit 
to the rate of change of the magnetic field B over 
a surface spanning the circuit. In its differential 
form, the law becomes one of Maxwell’s 
equations. 

Suppose first that the fields are generated by 
charges all moving relative to a given inertial 
frame of reference R with the same velocity v. 
Then in a second frame R’ moving relative to R 
with velocity v, there is a stationary distribution of 
charge. If the velocity is much less than that of 
light, then the electric field E’ measured in R’ is 
related to the electric and magnetic E and B 
measured in R by 


E'’=E+vAB 


Since the field measured in R’ is that of a stationary 
distribution of charge, we have 


curl E = 0 


In R, the charges are all moving with velocity v, so 
their configuration looks exactly the same from the 
point r at time 7 as it does from the point + + vr at 
time t+ T. Therefore, 

B(r + vr,t 7) = B(r,t) 

E(r + vr,t +T) = E(r,t) 


and hence by taking derivatives with respect to 7 
at r= Q, 


v-gradB + = 0 
: 22) 


v- grad E += 0 


So we must have 


0 — curl E' 
= curl E + curl(v ^ B) 
= curl E + vdiv B — v- grad B 


= curl E + [23] 
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since div B — 0. It follows that 
OB 

|E +- = 24 

curl E + E, 0 [24] 


Equation [24] is linear in B and E; so by adding 
the magnetic and electric fields of different streams 
of charges moving relative to R with different 
velocities, we deduce that it holds generally for the 
electric and magnetic fields generated by moving 
charges. 

Equation [24] encodes Faraday's law of electro- 
magnetic induction, which describes how changing 
magnetic fields can generate currents. In the static case 


OB 
a 0 
and the equation reduces to curlE=0 - the 


condition that the electrostatic field should be 
conservative; that is, it should do no net work 
when a charge is moved around a closed loop. 

More generally, consider a wire loop in the shape of 
a closed curve y. Let S be a fixed surface spanning ^. 
Then we can deduce from eqn [24] that 


f Eds= | curlE-ds 
^y S 


OB 
2 ia 
=—< | aas 25] 
dé Js 


If the magnetic field is varying, so that the integral of B 
over $ is not constant, then the integral of E around the 
loop will not be zero. There will be a nonzero electric 
field along the wire, which will exert a force on the 
electrons in the wire and cause a current to flow. 


The quantity 
$ E: ds 


which is measured in volts, is the work done by the 
electric field when a unit charge makes one circuit 
of the wire. It is called the electromotive force 
around the circuit. The integral is the magnetic flux 
linking the circuit. The relationship [25] between 
electromotive force and rate of change of magnetic 
flux is Faraday’s law. 


The Field of Charges in Uniform Motion 


We can extract another of Maxwell's equations 
from this argument. By EM3', a single charge e with 
velocity v generates an electric field E and a 
magnetic field 


| HoeU ^r 
BEP 
where r is the vector from the charge to the point at 


which the field is measured. In the frame of reference 
R' in which the charge is at rest, its electric field is 


+ O( /c*) 


, er 
47re073 


In the frame in which it is moving with velocity 


v, E— E' + O(v/c). Therefore, 
» 
o(a) 


| v^E vAE 
~ co c 
By taking the curl of both sides, and dropping terms 
of order v? /c?, 


curl(cB) — curl (* - 3j 


= - (vdiv E — v- grad E) 


cB 


But 
div E = p/€o, pd 
Ot 
by [22]. Therefore, 
10E 1 
curl(cB) — ^ e = caf 


where J = pv. By summing over the separate particle 
velocities, we conclude that 
1 OE 
curl B = dàt = Ligh 
holds for an arbitrary distribution of charges, provided 
that their velocities are much less than that of light. 


Maxwell’s Equations 


The basic principles, together with the assumption of 
Galilean invariance for velocities much less than that 
of light, have allowed us to deduce that the electric and 
magnetic fields generated by a continuous distribution 
of moving charges in otherwise empty space satisfy 


div E = £ [26] 
€0 
div B — 0 [27] 
1 OE 
curl B — — Lod [28] 
curl E + 9 0 [29] 


a 


where p is the charge density, J is the current 
density, and c^—1/eouo. These are Maxwell’s 
equations, the basis of modern electrodynamics. 
Together with the Lorentz-force law, they describe 
the dynamics of charges and electromagnetic fields. 

We have arrived at them by considering how basic 
electromagnetic processes appear in moving frames 
of reference — an unsatisfactory route because we 
have seen on the way that the principles on which 
we based the derivation are incompatible with 
Galilean invariance for velocities comparable with 
that of light. Maxwell derived them by analyzing an 
elaborate mechanical model of electric and magnetic 
fields — as displacements in the luminiferous ether. 
That is also unsatisfactory because the model has 
long been abandoned. The reason that they are 
accepted today as the basis of theoretical and 
practical applications of electromagnetism has little 
to do with either argument. It is first that they are 
self-consistent, and second that they describe the 
behavior of real fields with unreasonable accuracy. 


The Continuity Equation 


It is not immediately obvious that the equations are 
self-consistent. Given p and J as functions of the 
coordinates and time, Maxwell's equations are two 
scalar and two vector equations in the unknown 
components of E and B. That is, a total of eight 
equations for six unknowns — more equations than 
unknowns. Therefore, it is possible that they are in 
fact inconsistent. 

If we take the divergence of eqn [29], then we 
obtain 


DE ans. 
E (div B) — 0 
which is consistent with eqn [27]; so no problem 


arises here. However, by taking the divergence of 
eqn [28] and substituting from eqn [26], we get 


0 — div curl B 
LO ws 
=a (div E) + podiv J 
o i 
= Lo E: r3 div) 
This gives a contradiction unless 
x 4 divJ — 0 [30] 
So the choice of p and J is not unconstrained; they 


must be related by the continuity equation [30]. This 
holds for physically reasonable distributions of 
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charge; it is a differential form of the statement 
that charges are neither created nor destroyed. 


Conservation of Charge 


To see the connection between the continuity 
equation and charge conservation, let us look at 
the total charge within a fixed V bounded by a 
surface S. If charge is conserved, then any increase 
or decrease in a short period of time must be 
exactly balanced by an inflow or outflow of charge 
across S. 

Consider a small element dS of S with outward 
unit normal and consider all the particles that have a 
particular charge e and a particular velocity v at 
time t. Suppose that there are c of these per unit 
volume (ø is a function of position). Those that cross 
the surface element between ¢ and t+ ôt are those 
that at time ż lie in the region of volume 


|v - n dS ét| 


shown in Figure 1. They contribute eov - dS6t to the 
outflow of charge through the surface element. But 
the value of J at the surface element is the sum of 
ecv over all possible values of v and e. By summing 
over v, e, and the elements of the surface, therefore, 
and by passing to the limit of a continuous 
distribution, the total rate of outflow is 


ES 


Charge conservation implies that the rate of 
outflow should be equal to the rate of decrease in 
the total charge within V. That is, 


5 [ eav+ [1-45 —0 31) 
dt Jy Js 


By differentiating the first term under the integral 
sign and by applying the divergence theorem to the 
second integral, 


LG 4 div] Jav En 32] 


If this is to hold for any choice of V, then p and J 
must satisfy the continuity equation. Conversely, the 
continuity equation implies charge conservation. 


n 


van) T fa 


Figure 1 The outflow through a surface element. 
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The Displacement Current 


The third of Maxwell’s equations can be written as 


curl B — Ho (J + 60 =) [33] 


in which form it can be read as an equation 
for an unknown magnetic field B in terms of 
a known current distribution / and electric 
field E. When E and J are independent of t, it 
reduces to 


. curl B = pio J 


which determines the magnetic field of a steady 
current, in a way that was already familiar 
to Maxwell's contemporaries. But his second 
term on the right-hand side of [33] was new; it 
adds to J the so-called vacuum displacement 
current 


Tar 


The name comes from an analogy with the 
behavior of charges in an insulating material. 
Here no steady current can flow, but the distribu- 
tion of charges within the material is distorted 
by an external electric field. When the field 
changes, the distortion also changes, and the result 
appears as a current — the displacement current — 
which flows during the period of change. Max- 
well's central insight was that the same term 
should be present even in empty space. The 
consequence was profound; it allowed him to 
explain the propagation of light as an electromag- 
netic phenomenon. 


The Source-Free Equations 


In a region of empty space, away from the 
charges generating the electric and magnetic fields, 
we have p=0=J, and Maxwell’s equations 


reduce to 
div E = 0 [34] 
div B = 0 [35] 
1 9E 
curl B - 7; —0 [36] 
OB 
curl E += 0 [37] 


where c=1/Veono. By taking the curl of eqn [36] 
and by substituting from eqns [35] and [37], we 
obtain 


1 OE 
- D {Rn ia 
0 = grad (div B) 一 VB 2 curl ( x) 


] ð 
| p2 
= —V^B — ciat (curl E) 
n 
c* Of 
Therefore, the three components of B in empty space 
satisfy the (scalar) wave equation 


Cia D 


=-V*B + [38] 


Here (J is the d'Alembertian operator, defined by 
I4 aa 


aor OO Oy og 
By taking the curl of eqn [37], we also obtain 
[]E-O. 


Monochromatic Plane Waves 


The fact that E and B are vector-valued solutions of 
the wave equation in empty space suggests that we 
look for “plane wave" solutions of Maxwell's 
equations in which 


E — a cos Q 4- B sinQ [39] 


where @,f are constant vectors and 
Ww | 
Q=-(ct-—r-e), e-e=1 [40] 
C 


with w > 0, a, 8, and e constant; w is the frequency 
and e is a unit vector that gives the direction of 
propagation (adding 7 to ¢ and cre to r leaves u 
unchanged). This satisfies the wave equation, but for 
a general choice of the constants, it will not be 
possible to find B such that eqns [34]-[37] also hold. 
By taking the divergence of eqn [39], we obtain 


divE = ^ (e - æ sin Q — e- B cos 9) [41] 

For eqn [34] to hold, therefore, we must choose @ 

and orthogonal to e. For eqn [37] to hold, we 
must find B such that 

| B 

curl E =+ (e Aasin® — e Ap cosh) = E. [42] 


A possible choice is 


E 1 
p-*^ =~ le^ a cosQ) + e ^ B sinh) [43] 


and it is not hard to see that E and B then satisfy 
[35] and [36] as well. 


The solutions obtained in this way are called 
*monochromatic electromagnetic plane waves." 

Note that such waves are transverse in the sense 
that E and B are orthogonal to the direction of 
propagation. The definition E can be written more 
concisely in the form 


E = Re|(a + iB)e "| [44] 


It is an exercise in Fourier analysis to show every 
solution in empty space is a combination of 
monochromatic plane waves. A plane wave has 
“plane” or “linear” polarization if œ and f are 
proportional. It has “circular” polarization if 
a-a=B-B,a-B=0. 

At the heart of Maxwell’s theory was the idea that 
a light wave with definite frequency or color is 
represented by a monochromatic plane solution of 
his equations. 


Potentials 


For every solution of Maxwell’s equations in vacuo, 
the components of E and B satisfy the three- 
dimensional wave equation; but the converse is not 
true. That is, it is not true in general that if 


[1B = 0, LIE = 0 


then E and B satisfy Maxwell's equations. For this 
to happen, the divergence of both fields must vanish, 
and they must be related by [36] and [37]. These 
additional constraints are somewhat simpler to 
handle if we work not with the fields themselves, 
but with auxiliary quantities called “potentials.” 

The definition of the potentials depends on 
standard integrability conditions from vector calcu- 
lus. Suppose that v is a vector field, which may 
depend on time. If curl v —0, then there exists a 
function @ such that 


v = grad ó [45] 


If div v = 0, then there exists a second vector field a 
such that 


v — curla [46] 


Neither ó nor a is uniquely determined by v. In the 
first case, if [45] holds, then it also holds when ó is 
replaced by ¢@’ — à + f, where f is a function of time 
alone; in the second, if [46] holds, then it also holds 
when a is replaced by 


a —a-- gradu 


for any scalar function 4 of position and time. It 
should be kept in mind that the existence statements 
are local. If v is defined on a region U with 


Introductory Article: Electromagnetism 49 


nontrivial topology, then it may not be possible to 
find a suitable ó or a throughout the whole of U. 
Suppose now that we are given fields E and B 
satisfying Maxwell's equations [26]-[29] with 
sources represented by the charge density p and the 
current density J. Since div B — 0, there exists a time- 
dependent vector field A (t, x, y, z) such that 


B = curl A 


If we substitute B — curl A into [29] and interchange 
curl with the time derivative, then we obtain 


curl C 十 x) = D 
Ot 


It follows that there exists a scalar $(t, x,y,z) such 
that 


OA 
E = —grad ¢ — F^] [47] 
Such a vector field A is called a *magnetic vector 
potential"; a function @ such that eqn [47] holds is 
called an *electric scalar potential." 
Conversely, given scalar and vector functions ó 
and A of t, x, y, z, we can define B and E by 
DA 
B — curlA, E = —grad ¢ — T [48] 
Then two of Maxwell's equations hold automati- 
cally, since 


and E. 95 0 
Ot 


The remaining pair translate into conditions on A 
and ó. Equation [26] becomes 


div B — 0, 


div E = — V^$ — 5 (div A) = 
Ot EQ 


and eqn [28] becomes 


1 OE 3 
curl B — "r^ —V^A + grad div A 
* ZB) (rad p+ r3 
= pol 
If we put 
1o. 
EET + div (A) 


then we can rewrite the equations for A and ó more 
simply as 


DA + grad a = jJ 
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Here we have four equations (one scalar, one vector) 
in four unknowns (ó and the components of A). Any 
set of solutions ó, A determines a solution of 
Maxwell's equations via [48]. 


Gauge Transformations 


Given solutions E and B of Maxwell's equations, 
what freedom is there in the choice of A and ó? 
First, A is determined by curlA— B up to the 
replacement of A by 


A= A+ gradu 


for some function u of position and time. The scalar 
potential ó' corresponding to A’ must be chosen so 
that 


j OA’ 
一 grad 几 = E 
OA Ou 


| ou 
= -grad( - c) 


That is, 9' — o — Ou/Ot + f(t), where f is a function 
of t alone. We can absorb f into u by subtracting 


fr 


(this does not alter A’). So the freedom in the choice 
of A and ó is to make the transformation 
| „_ au 
AmA =A+ grad x, pro dh [49| 
for any u=u(t, x,y,z). The transformation [49] is 
called a “gauge transformation." 

Under [49], 


PELA 
c? Ot 
It is possible to show, under certain very mild 


conditions on o, that the inhomogeneous wave 
equation 


+ div(A’) = o — Ou 


ara 


[]4 — a [50] 


has a solution u = u(t, x,y,z). If we choose u so that 
[50] holds, then the transformed potentials A’ and ó' 
satisfy 


, 


This is the *Lorenz gauge condition," named after 
L Lorenz (not the H A Lorentz of the “Lorentz 
contraction”). 


If we impose the Lorenz condition, then the only 
remaining freedom in the choice of A and @ is to 
make gauge transformations [49] in which xz is a 
solution of the wave equation Du=0. Under the 
Lorenz condition, Maxwell's equations take the 
form 


[16 = p/eo, DA = poJ [51] 


Consistency with the Lorenz condition follows from 
the continuity equation on ¢ and J. 

In the absence of sources, therefore, Maxwell's 
equations for the potential in the Lorenz gauge 
reduce to 


together with the constraint 
1 00 
divA 5 2 ðt — 
We can, for example, choose three arbitrary solu- 


tions of the scalar wave equation for the compo- 
nents of the vector potential, and then define ¢ by 


0 


$= | div dt 


Whatever choice we make, we shall get a solution of 
Maxwell’s equations, and every solution of Max- 
well’s equations (without sources) will arise from 
some such choice. 


Historical Note 


At the end of the eighteenth century, four types of 
electromagnetic phenomena were known, but not 
the connections between them. 


e Magnetism, the word derives from the Greek for 
“stone from Magnesia.” 

e Static electricity, produced by rubbing amber with 
fur; the word “electricity” derives from the Greek 
for “amber.” 

e Light. 

e Galvanism or “animal electricity" — the electricity 
produced by batteries, discovered by Luigi 
Galvani. 


The construction of a unified theory was a slow 
and painful business. It was hindered by attempts, 
which seem bizarre in retrospect, to understand 
electromagnetism in terms of underlying mechanical 
models involving such inventions as “electric fluids” 
and “magnetic vortices.” We can see the legacy of 
this period, which ended with Einstein’s work in 
1905, in the misleading and archaic terms that still 
survive in modern terminology: “magnetic flux,” 
“lines of force,” “electric displacement,” and so on. 
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Maxwell's contribution was decisive, although 
much of what we now call *Maxwell's theory" is 
due to his successors (Lorentz, Hertz, Einstein, and 
so on); and, as we shall see, a key element in 
Maxwell’s own description of electromagnetism — 
the "electromagnetic ether," an  all-pervasive 
medium which was supposed to transmit electro- 
magnetic waves — was thrown out by Einstein. 

A rough chronology is as follows. 


e 1800 Volta demonstrated the connection between 
galvanism and static electricity. 

e 1820 Oersted showed that the current from a 
battery generates a force on a magnet. 

e 1822 Ampère suggested that light was a wave 
motion in a “luminiferous ether" made up of two 
types of electric fluid. In the same year, Galileo's 
*Dialogue concerning the two chief world sys- 
tems" was removed from the index of prohibited 
books. 

e 1831 Faraday showed that moving magnets can 
induce currents. 


G Gallavotti, Università di Roma "La Sapienza," 
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Foundations: Atoms and Molecules 


Classical statistical mechanics studies properties of 
macroscopic aggregates of particles, atoms, and 
molecules, based on the assumption that they are 
point masses subject to the laws of classical 
mechanics. Distinction between macroscopic and 
microscopic systems is evanescent and in fact the 
foundations of statistical mechanics have been laid 
on properties, proved or assumed, of few-particle 
systems. 

Macroscopic systems are often considered in 
stationary states, which means that their micro- 
scopic configurations follow each other as time 
evolves while looking the same macroscopically. 
Observing time evolution is the same as sampling 
(*not too closely" time-wise) independent copies of 
the system prepared in the same way. 

A basic distinction is necessary: a stationary state 
may or may not be in equilibrium. The first case 
arises when the particles are enclosed in a container 
Q and are subject only to their mutual conservative 


e 1846 Faraday suggested that light is a vibration 
in magnetic lines of force. 

e 1863 Maxwell published the equations that 
describe the dynamics of electric and magnetic 
fields. 

e 1905  Einstein's paper “On the electrodynamics 
of moving bodies." 
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interactions and, possibly, to external conservative 
forces: a typical example is a gas in a container 
subject to forces due to the walls of 2 and gravity, 
besides the internal interactions. This is a very 
restricted class of systems and states. 

A more general case is when the system is in a 
stationary state but it is also subject to nonconservative 
forces: a typical example is a gas or fluid in which a 
wheel rotates, as in the Joule experiment, with some 
device acting to keep the temperature constant. The 
device is called a thermostat and in statistical 
mechanics it has to be modeled by forces, including 
nonconservative ones, which prevent an indefinite 
energy transfer from the external forcing to the system: 
such a transfer would impede the occurrence of 
stationary states. For instance, the thermostat could 
simply be a constant friction force (as in stirred 
incompressible liquids or as in electric wires in which 
current circulates because of an electromotive force). 

A more fundamental approach would be to 
imagine that the thermostat device is not a phenom- 
enologically introduced nonconservative force (e.g., 
a friction force) but is due to the interaction with an 
external infinite system which is in “equilibrium at 
infinity." 

In any event nonequilibrium stationary states are 
intrinsically more complex than equilibrium states. 
Here attention will be confined to equilibrium 
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statistical mechanics of systems of N identical point 
particles Q = (q4, . . -, qx) enclosed in a cubic box €), 
with volume V and side L, normally assumed to 
have perfectly reflecting walls. 

Particles of mass m located at q,q will be 
supposed to interact via a pair potential y(q — q’). 
The microscopic motion follows the equations 


N 
23:17 
j=1 
= 9, 6(Q) [1] 


where the potentialsy is assumed to be smooth 
except, possibly, for |q — q’| < ro where it could be 
+oo, that is, the particles cannot come closer than 
ro, and at ro [1] is interpreted by imagining that they 
undergo elastic collisions; the potential Wwa models 
the container and it will be replaced, unless 
explicitly stated, by an elastic collision rule. 

The time evolution (Q, Q) — S,(Q, Q) will, there- 
fore, be described on the position — velocity space, 
F(N), of the N particles or, more conveniently, on 
the phase space, i.e., by a time evolution S, on the 
momentum — position (P,Q, with P —7:Q) space, 
F(N). The motion being conservative, the energy 


u= > zh + 》 v(q; 


i<j 


K(P) + ®(Q) 


will be a constant of motion; the last term in ® is 
missing if walls are perfect. This makes it convenient to 
regard the dynamics as associated with two dynamical 
systems (F(N),S;) on the 6N-dimensional phase 
space, and (Fy(N),S;,) on the (6N — 1)-dimensional 
surface of energy U. Since the dynamics [1] is 
Hamiltonian on phase space, with Hamiltonian 


HP.9)* 77 


it follows that the volume d "Pd?" Q is conserved 
(i.e., a region E has the same volume as S,E) and 
also: the area 6(H(P, Q) — U d?" pa? O is conserved. 

The above dynamical systems are well defined, 
i.e., $, is a map on phase space globally defined for 
all ? € (00,06), when the interaction potential is 
bounded below: this is implied by the a priori 
bounds due to energy conservation. For gravita- 
tional or Coulomb interactions, much more has to 
be said, assumed, and done in order to even define 
the key quantities needed for a statistical theory of 
motion. 

Although our world is three dimensional (or at 
least was so believed to be until recent revolutionary 


-4)-* » Wwall(qi) 


=g) T Ds W wall (qj) 


def ze 


= 


=p} + 9(Q)^ K+ 


theories), it will be useful to consider also systems of 
particles in dimension d Æ 3: in this case the above 
6N and 3N become, respectively, 2dN and dN. 
Systems with dimension d — 1,2 are in fact some- 
times very good models for thin filaments or thin 
films. For the same reason, it is often useful to 
imagine that space is discrete and particles can only 
be located on a lattice, for example, on Z^ (see the 
section “Lattice models"). 

The reader is referred to Gallavotti (1999) for 
more details. 


Pressure, Temperature, and Kinetic 
Energy 


The beginning was BERNOULLI’s derivation of 
the perfect gas law via the identification. of 
the pressure at numerical density p with the 
average momentum transferred per unit time to 
a surface element of area dS on the walls: that is, 
the average of the observable 2mvupu dS, with v 
the normal component of the velocity of 
the particles that undergo collisions with dS. 
If f (v)dv is the distribution of the normal compo- 
nent of velocity and f (v)d? v= ||; f (v;)d? EE 
(v1,U2,U3), is the total velocity distribution, 
the average of the momentum transferred is pdS 
given by 


d$] 2mv!pf(v)dv = ds | mv? pf (vd 


v0 
ZZIK 
P3 ( x) dS — [2] 
Furthermore (2/3)(K/N) was identified as pro- 
portional to the absolute temperature (K/N) — at 
const (3/2)T which, with present-day notations, is 
written as (2/3)(K/N) =kpT. The constant kg was 
(later) called Boltzmann's constant and it is the 
same for at least all perfect gases. Its independence 
on the particular nature of the gas is a conse- 
quence of Avogadro's law stating that equal 
volumes of gases at the same conditions of 
temperature and pressure contain equal number 
of molecules. 

Proportionality between average kinetic energy 
and temperature via the universal constant kpg 
became in fact a fundamental assumption extending 
to all aggregates of particles gaseous or not, never 
challenged in all later works (until quantum 
mechanics, where this is no longer true, see the 
section *Quantum statistics". 

For more details, we refer the reader to Gallavotti 


(1999). 


u Z um 7 3 
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Heat and Entropy 


After Clausius’ discovery of entropy, BOLTZMANN, in 
order to explain it mechanically, introduced the heat 
theorem, which he developed to full generality 
between 1866 and 1884. Together with the men- 
tioned identification of absolute temperature with 
average kinetic energy, the heat theorem can also be 
considered a founding element of statistical 
mechanics. 

The theorem makes precise the notion of time 
average and then states in great generality that 
given any mechanical system one can associate with 
its dynamics four quantities U, V, p, T, defined as 
time averages of suitable mechanical observables 
(i.e., functions on phase space), so that when the 
external conditions are infinitesimally varied and 
the quantities U, V change by dU, dV, respectively, 
the ratio (dU + pdV)/T is exact, i.e., there is a 
function. S(U, V) whose corresponding variation 
equals the ratio. It will be better, for the purpose of 
considering very large boxes (V — oc) to write this 
relation in terms of intensive quantities u = U/N and 
v= V/N as 


du 4- pdv 


T Is exact [3] 
i&, the ratio equals the variation ds of 
s(U/N, V/N) = (1/N)S(U, V). 


The proof originally dealt with monocyclic 
systems, i.e., systems in which all motions are 
periodic. The assumption is clearly much too 
restrictive and justification for it developed from 
the early “nonperiodic motions can be regarded 
as periodic with infinite period" (1866), to the 
later ergodic hypothesis and finally to the 
realization that, after all, the heat theorem 
does not really depend on the ergodic hypothesis 
(1884). 

Although for a one-dimensional system the proof 
of the heat theorem is a simple check, it was a real 
breakthrough because it led to an answer to the 
general question as to under which conditions one 
could define mechanical quantities whose variations 
were constrained to satisfy [3] and therefore could 
be interpreted as a mechanical model of Clausius’ 
macroscopic thermodynamics. It is reproduced in 
the following. 

Consider a one-dimensional system subject to 
forces with a confining potential y(x) such that 
Ip(x)|>0 for |x|>0,p"(0)>0 and y(x) z +00. 
All motions are periodic, so that the system is 
monocyclic. Suppose that the potential (x) depends 
on a parameter V and define a state to be a motion with 
given energy U and given V; let 


U = total energy of the system = K + ® 


T — time average of the kinetic energy K — (K) 


V = the parameter on which o [4] 


is supposed to depend 


p = —time average of Oy,, —(Ovy) 


A state is thus parametrized by U, V. If such 
parameters change by dU,dV, respectively, and 
if, dL% — pdV, do% ef dU +pdV, then [3] holds. In 
fact, let x+(U, V) be the extremes of the oscillations of 
the motion with given U, V and define S as 


S = 2log PRIN y (U — g(x))dx 
x (U,V) 
[(dU — y, (x)dV)(dx/ VK) 
J(dx/VK)K 


Noting that dx//K = ,/2/m dt, [3] follows because 
time averages are given by integrating with respect 
to dx/VK and dividing by the integral of 1/4/K. 

For more details, the reader is referred to Boltzmann 
(1968b) and Gallavotti (1999). 


> dS = [5] 


Heat Theorem and Ergodic Hypothesis 


Boltzmann tried to extend the result beyond the one- 
dimensional systems (e.g., to Keplerian motions, 
which are not monocyclic unless only motions with 
a fixed eccentricity are considered). However, the 
early statement that “aperiodic motions can be 
regarded as periodic with infinite period” is really 
the heart of the application of the heat theorem 
for monocyclic systems to the far more complex gas 
in a box. 

Imagine that the gas container €) is closed by a 
piston of section A located to the right of the 
origin at distance L and acting as a lid, so that the 
volume is V — AL. The microscopic model for the 
piston will be a potential g(L — £) if x= (£, n, C) are 
the coordinates of a particle. The function g(r) 
will vanish for rro, for some ro <L, and 
diverge to 十 co at r=0. Thus, ro is the width of 
the layer near the piston where the force of the 
wall is felt by the particles that happen to be 
roaming there. 

The contribution to the total potential energy 
$ due to the walls is Wwan —» ;v(L —&) and 
Oy = A^ 0j; assuming monocyelicity, it is neces- 
sary to evaluate the time average of 0,®(x)= 
Or, Wwail —»5v(L-—£6). As time evolves, the 
particles x; with £ in the layer within ro of the 
wall will feel the force exercised by the wall and 
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bounce back. One particle in the layer will- con- 
tribute to the average of Or B(x) the amount 


1 
total time 


t 
2 | -PL -dt 上 
J to 

if to is the first instant when the point j enters the 
layer and t1 is the instant when the £-component of 
the velocity vanishes “against the wall.” Since 
—p(L—€&) is the £-component of the force, the 
integral is 2m|&| (by Newton's law), provided, of 
course, € > 0. 

Suppose that no collisions between particles occur 
while the particles travel within the range of the 
potential of the wall, 'i.e., the mean free path is much 
greater than the range of the potential Y defining the 
wall. The contribution of collisions to the average 
momentum transfer to the wall per unit time is 
therefore given by, see [2], 


/ jar {Viva Avr 
w»0 


if Pwal f(v) are the average density near the wall 
and, respectively, the average fraction of particles 
with a velocity component normal to the wall 
between v and v + dv. Here p, f are supposed to be 
independent of the point on the wall: this should be 
true up to corrections of size o(A). 

Thus, writing the average kinetic energy per particle 
and per velocity component, | (m/2)v*f(v)dv, as 


(1/2)8^ (cf. [2]) it follows that 


p e dd (Oy) et Pwan’ [7] 


has the physical interpretation of pressure. (1/2)87 
is the average kinetic energy per degree of freedom: 
hence, it is proportional to the absolute temperature 
T (cf. see the section “Pressure, temperature, and 
kinetic energy"). 

On the other hand, if motion on the energy 
surface takes place on a single periodic orbit, the 
quantity p in [7] is the right quantity that would 
make the heat theorem work; see [4]. Hence, 
regarding the trajectory on each energy surface as 
periodic (i.e., the system as monocyclic) leads to the 
heat theorem with p,U,V,T having the rigbt 
physical interpretation corresponding to their appel- 
lations. This shows that monocyclic systems provide 
natural models of thermodynamic behavior. 

Assuming that a chaotic system like a gas in a 
container of volume V will satisfy, for practical 
purposes, the above property, a quantity p can be 
defined such that dU + pdV admits the inverse of 
the average kinetic energy (K) as an integrating 
factor and, furthermore, p,U,V,(K) have the 
physical interpretations of pressure, energy, volume, 


and (up to a proportionality factor) absolute 
temperature, respectively. 

Boltzmann's conception of space (and time) as 
discrete allowed him to conceive the property that 
the energy surface is constituted by "points" all of 
which belong to a single trajectory: a property that 
would be impossible if the phase space was really a 
continuum. Regarding phase space as consisting of a 
finite number of “cells” of finite volume 5^N, for 
some b > 0 (rather than of a continuum of points), 
allowed him to think, without logical contradiction, 
that the energy surface consisted of a single 
trajectory and, hence, that motion was a cyclic 
permutation of its points (actually cells). 

Furthermore, it implied that the time average of 
an observable F(P,Q) had to be identified with its 
average on the energy surface computed via the 
Liouville distribution 


Cc fre. O)é6(H(P, Q)— U)dP dQ 
with 
C= / 5(H(P,Q) — U)dP dQ 


(the appropriate normalization factor): a property 
that was written symbolically 


dt dPdQ 
T  [dPdO 
or 
T 
jim = ‘ F(S,(P, Q))dt 


_ [EP OHP, Q) - U)APdQ! ig 
[ &H(P., Q^ — U)dP'dQ' 


The validity of [8] for all (piecewise smooth) 
observables F and for all points of the energy 
surface, with the exception of a set of zero area, is 
called the ergodic hypothesis. 

For more details, the reader is referred to 
Boltzmann (1968) and Gallavotti (1999). 


Ensembles 


Eventually Boltzmann in 1884 realized that the 
validity of the heat theorem for averages computed 
via the right-hand side (rhs) of [8] held indepen- 
dently of the ergodic hypothesis, that is, [8] was not 
necessary because the heat theorem (i.e., [3]) could 
also be derived under the only assumption that the 
averages involved in its formulation were computed 
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as averages over phase space with respect to the 
probability distribution on the rhs of [8]. 
Furthermore, if T was identified with the average 
kinetic energy, U with the average energy, and p 
with the average force per unit surface on the walls 
of the container Q with volume V, the relation [3] 
held for a variety of families of probability distribu- 
tions on phase space, besides [8]. Among these are: 


1. The *microcanonical ensemble," which is the 
collection of probability distributions on the rhs 
of [8] parametrized by 4 — U/N,v — V/N (energy 
and volume per particle), 


Hy (dP dQ) 

1 dP dQ 
— Zm(U,N, y; HU 9) Ui NIpdN d 
where / is a constant with the dimensions of an 
action which, in the discrete representation of 
phase space mentioned in the previous section, can 
be taken such that h? equals the volume of the 
cells and, therefore, the integrals with respect to [9] 
can be interpreted as an (approximate) sum over 
the cells conceived as microscopic configurations 

of N indistinguishable particles (whence the N!). 
2. The *canonical ensemble," which is the collec- 
tion of probability distributions parametrized by 

b, v= V/N, 


" 1 E dPd 
iui ,(dPdQ) = ZB, N, V): PPS cM [10] 


to which more ensembles can be added, such as 
the grand canonical ensemble (Gibbs). 

3. The “grand canonical ensemble” which is the 
collection of probability distributions parameter- 


ized by 有 入 and defined over the space 
Fi UNS d UN), 
u3 (dPdQ) | 
M 1 e^ N-BH(P.Q) dPdO [1 1] 
Z,(3,, V) NIHAN 


Hence, there are several different models of thermo- 
dynamics. The key tests for accepting them as real 
microscopic descriptions of macroscopic thermo- 
dynamics are as follows. 


1. A correspondence between the macroscopic 
states of thermodynamic equilibrium and the 
elements of a collection of probability distribu- 
tions on phase space can be established by 
identifying, on the one hand, macroscopic 
thermodynamic states with given values of the 
thermodynamic functions and, on the other, 


probability distributions attributing the same 
average values to the corresponding microscopic 
observables (i.e., whose averages have the inter- 
pretation of thermodynamic functions). 

2. Once the correct correspondence between the 
elements of the different ensembles is established, 
that is, once the pairs (u,v),(8,v), (B, 4) are so 
related to produce the same values for the 
averages U, V, ks T “ 8-1, p|OQ| of 


Hr Qr J joal qi 2m(v -ndg {12 


where (650(q,) is a delta-function pinning 4, to 
the surface 02), then the averages of all physi- 
cally interesting observables should coincide at 
least in the thermodynamic limit, Q — oo. In this 
way, the elements / of the considered collection 
of probability distributions can be identified with 
the states of macroscopic equilibrium of the 
system. The ys depend on parameters and there- 
fore they form. an ensemble: each of them 
corresponds to a macroscopic equilibrium state 
whose thermodynamic functions are appropriate 
averages of microscopic observables and therefore 
are functions of the parameters identifying p. 


Remark The word “ensemble” is often used to 
indicate the individual probability distributions of 
what has been called here an ensemble. The meaning 
used here seems closer to the original sense in the 
1884 paper of Boltzmann (in other words, often by 
*ensemble" one means that collection of the phase 
space points on which a given probability distribu- 
tion is considered, and this does not seem to be the 
original sense). 


For instance, in the case of the microcanonical 
distributions this means interpreting energy, volume, 
temperature, and pressure of the equilibrium state 
with specific energy 4 and specific volume v as 
proportional, through appropriate universal propor- 
tionality constants, to the integrals with respect to 
Us (dPdOQ) of the mechanical quantities in [12]. 
The averages of other thermodynamic observables in 
the state with specific energy u and specific volume 
v should be given by their integrals with respect 
tO JL, mr 

Likewise, one can interpret energy, volume, 
temperature, and pressure of the equilibrium state 
with specific energy u and specific volume v as the 
averages of the mechanical quantities [12] with 
respect to the canonical distribution p% ,(dP dQ) 
which has average specific energy precisely u. The 
averages of other thermodynamic observables in the 
state with specific energy and volume u and v are 
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given by their integrals with respect to j/5,. A 
similar definition can be given for the description of 
thermodynamic equilibria via the grand canonical 
distributions. 

For more details, see Gibbs (1981) and Gallavotti 
(1999). 


Equivalence of Ensembles 


BOLTZMANN proved that, computing averages via the 
microcanonical or canonical distributions, the essen- 
tial property [3] was satisfied when changes in their 
parameters (i.e., zt, v or 3,v, respectively) induced 
changes du and dy-on energy and volume, respec- 
tively. He also proved that the function s, whose 
existence is implied by [3], was the same function 
once expressed as a function of u,v (or of any pair 
of thermodynamic parameters, e.g., of T,v or p,u). 
A close examination of Boltzmann's proof shows 
that the [3] holds exactly in the canonical ensemble 
and up to corrections tending to 0 as Q — oo in the 
microcanonical ensemble. Identity of thermo- 
dynamic functions evaluated in the two ensembles 
holds, as a consequence, up to corrections of this 
order. In addition, Gibbs added that the same held 
for the grand canonical ensemble. 

Of course, not every collection of stationary 
probability distributions on phase space would 
provide a model for thermodynamics: Boltzmann 
called *orthodic" the collections of stationary 
distributions which generated models of thermo- 
dynamics through the above-mentioned identifica- 
tion of its elements with macroscopic equilibrium 
states. The microcanonical, canonical, and the later 
grand canonical ensembles are the chief examples 
of orthodic ensembles. Boltzmann and Gibbs 
proved these ensembles to be not only orthodic 
but to generate the same thermodynamic functions, 
that is to generate the same thermodynamics. 

This meant freedom from the analysis of the truth 
of the doubtful ergodic hypothesis (still unproved in 
any generality) or of the monocyclicity (manifestly 
false if understood literally rather than regarding the 
phase space as consisting of finitely many small, 
discrete cells), and allowed Gibbs to formulate the 
problem of statistical mechanics of equilibrium as 
follows. 


Problem Study the properties of the collection of 
probability distributions constituting (any) one of 
the above ensembles. 


However, by no means the three ensembles just 
introduced exhaust the class of orthodic ensembles 
producing the same models of thermodynamics in 
the limit of infinitely large systems. The wealth of 


ensembles with the orthodicity property, hence 
leading to equivalent mechanical models of thermo- 
dynamics, can be naturally interpreted in connection 
with the phenomenon of phase transition (see the 
section “Phase transitions and boundary conditions”). 

Clearly, the quoted results do not “prove” 
that thermodynamic equilibria “are” described by 
the microcanonical, canonical, or grand canonical 
ensembles. However, they certainly show that, 
for most systems, independently of the number of 
degrees of freedom, one can define quite unambigu- 
ously a mechanical model of thermodynamics estab- 
lishing parameter-free, system-independent, physically 
important relations between thermodynamic quanti- 
ties (e.g., Ou (p(u, v) / T(u,v)) = 0,(1/T(u,v)), from [3]). 

The ergodic hypothesis which was at the root 
of the mechanical theorems on heat and entropy 
cannot be taken as a justification of their validity. 
Naively one would expect that the time scale 
necessary to see an equilibrium attained, called 
recurrence time scale, would have to be at least the 
time that a phase space point takes to visit all 
possible microscopic states of given energy: hence, 
an explanation of why the necessarily enormous size 
of the recurrence time is not a problem becomes 
necessary. 

In fact, the recurrence time can be estimated once 
the phase space is regarded as discrete: for the 
purpose of countering mounting criticism, Boltz- 
mann assumed that momentum was discretized in 
units of (2mkp,T)'/* (i.e., the average momentum 
size) and space was discretized in units of p^? 
(i.e., the average spacing), implying a volume of 
cells PN with 5353/3 (2piks T): then he calcu- 
lated that, even with such a gross discretization, a 
cell representing a microscopic state of 1cm? of 
hydrogen at normal condition would require a time 
(called “recurrence time") of the order of —10!9" 
times the age of the Universe (!) to visit the entire 
energy surface. In fact, the phase space volume is 
Tl—(p?N(2mkg Ty ^)*N =b and the number of 
cells of volume N is D/(NUPSN) ~ e°; and the 
time to visit all will be e?*75, with 7 a typical 
atomic unit, e.g., 1077s — but N=10". In this 
sense, the statement boldly made by young Boltz- 
mann that “aperiodic motions can be regarded as 
periodic with infinite period" was even made 
quantitative. 

The recurrence time is clearly so long to be 
irrelevant for all purposes: nevertheless, the correct- 
ness of the microscopic theory of thermodynamics 
can still rely on the microscopic dynamics once it is 
understood (as stressed by Boltzmann) that the 
reason why we observe approach to equilibrium, 
and equilibrium itself, over “human” timescales 
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(which are far shorter than the recurrence times) is 
due to the property that on most of the energy surface 
the (very few) observables whose averages yield 
macroscopic thermodynamic functions (namely pres- 
sure, temperature, energy, ...) assume the same value 
even if N is only very moderately large (of the order of 
10? rather than 10!^). This implies that this value 
„coincides with the average and therefore satisfies the 
heat theorem without any contradiction with the 
length of the recurrence time. The latter rather 
concerns the time needed to the generic observable to 
thermalize, that is, to reach its time average: the 
generic observable will indeed take a very long time to 
*thermalize" but no one will ever notice, because the 
generic observable (e.g., the position of a pre-identified 
particle) is not relevant for thermodynamics. 

The word * proof" is not used in the mathematical 
sense so far in this article: the relevance of a 
mathematically rigorous analysis was widely rea- 
lized only around the 1960s at the same time when 
the first numerical studies of the thermodynamic 
functions became possible and rigorous results were 
needed to check the correctness of various numerical 
simulations. 

For more details, the reader is referred to Boltzmann 
(1968a, b) and Gallavotti (1999). 


Thermodynamic Limit 


Adopting Gibbs axiomatic point of view, it is 
interesting to see the path to be followed to achieve 
an equivalence proof of three ensembles introduced 
in the section *Heat theorem and  ergodic 
hypothesis." 

A preliminary step is to consider, given a cubic 
box 2 of volume V = Lf, the normalization factors 
Z9 (8, A, V), Zz(B, N, V), and Z"*(U,N,V) in [9], 
[10], and [11], respectively, and to check that the 
following thermodynamic limits exist: 


"INE oe 
Bps. (8, X) lim log Z**(8, A, V) 


ef |， ] 
— 6f(8,p)= lim — log Z*(3,N, V) 
V0 N=p N [13] 
Ks Enel th, p) 
e 1 
= lim — log Z"*(U, N, V) 
V—00,N/V=p,U/N=u 
where the density p y SN / V is used, instead of 


v, for later reference. The normalization factors play 
an important role because they have simple thermo- 
dynamic interpretation (see the next section): they 
are called grand canonical, canonical, and micro- 
canonical partition functions, respectively. 


Not surprisingly, assumptions on the interparticle 
potential (q — q) are necessary to achieve an 
existence proof of the limits in [13]. The assump- 
tions on y are not only quite general but also have a 
clear physical meaning. They are 


1. stability: that is, existence of a constant B > 0 

such that ig e, —4;) -BN for all N > 0, 
Q1». QN c R 9 and 

2. temperedness: that is, existence of constants eo, 
R > O0 such that |y(q — q’)| < Blq — q ^ for 


Ig - q'| > R. 


The assumptions are satisfied by essentially all 
microscopic interactions with the notable exceptions 
of the gravitational and Coulombic interactions, 
which require a separate treatment (and lead to 
somewhat different results on the thermodynamic 
behavior). 

For instance, assumptions (1), (2) are satisfied 
if (9) is +20 for |q| < ro and smooth for |q| > ro, 
for some ro > 0, and furthermore (q) > Bo|q| 4? 
if ro < |d| € R, while for |g| » R it is |y(q)| < 
Bi jg| ^99. for some Bo, Bi,so > 0,R > ro. Briefly, 
y is fast diverging at contact and fast approaching 0 
at large distance. This is called a (generalized) 
Lennard-Jones potential. If ro > 0, is called a 
hard-core potential. If Bı — 0, the potential is said 
to have finite range. (See Appendix 1 for physical 
implications of violations of the above stability and 
temperedness properties.) However, in the following, 
it will be necessary, both for simplicity and to contain 
the length of the exposition, to restrict consideration 
to the case B, = 0, i.e., to 


d4-£9) 


plq) > Bola| V ^", ro < lal < R, 


Ie(q)] 5 0, |q| >R 


unless explicitly stated. 

Assuming stability and temperedness, the exis- 
tence of the limits in [13] can be mathematically 
proved: in Appendix 2, the proof of the first is 
analyzed to provide the simplest example of the 
technique. A remarkable property of the functions 
Dpgc(B, A), —Bpf«(B, p), and PSme (tt, p) is that they are 
convex functions: hence, they are continuous in the 
interior of their domains of definition and, at one 
variable fixed, are differentiable with respect to the 
other with at most countably many exceptions. 

In the case of a potential without hard core 
(Pmax — 0€), —pf.(8,p) can be checked to tend to 0 
slower than p as p — 0, and to 一 co faster than —p as 
p — oo (essentially proportionally to —p log p in both 
cases). Likewise, in the same case，smc(2a p) can be 
shown to tend to 0 slower than u — umin as Uu — Umin, 
and to —oo faster than —4 as u— oo. The latter 


[14] 
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asymptotic properties can be exploited to derive, from 
the relations between the partition functions in [13], 


Z**(B,A, V) = [3 eANZ°(3,N, V) 
N=0 [15] 


Z*(B, N, V) = i e ?U7z"mc(U. N, V) dU 
-B 


and, from the above-mentioned convexity, the 
consequences 


Pme (8, A) = max(&w"! — Bv "f (8,7) 


—Bfc(B, v!) = màx(— Bu + kg smc(u v.) 


[16] 


and that the maxima are attained in points, or 
intervals, internal to the intervals of definition. Let 
Vgc, Uc be points where the maxima are, respectively, 
attained in [16]. 

Note that the quantity ep Z°(3, N, V)/Z®(G, A, V) 
has the interpretation of probability of a density 
v | — N/V evaluated in the grand canonical distribu- 
tion. It follows that, if the maximum in the first of 
[16] is strict, that is, it is reached at a single point, the 
values of v^! in closed intervals not containing the 
maximum point Vs have a probability behaving as 
<e ©”, c» 0, as V — oc, compared to the probability 
of v "s in any interval containing v,!. Hence, vg has 
the interpretation of average value of v in the grand 
canonical distribution, in the limit V 一 oc. 

Likewise, the interpretation of 


e PUN 7mc(yN. N, V)/Z'(B, N, V) 


as probability in the canonical distribution of an 
energy density u shows that, if the maximum in the 
second of [16] is strict, the values of wu in closed 
intervals not containing the maximum point uc have 
a probability behaving as «e^ **,c > 0, as V — oc, 
compared to the probability of z's in any interval 
containing te. Hence, in the limit Q— oo, the 
average value of u in the canonical distribution is te. 
If the maxima are strict, [16] also establishes a 
relation between the grand canonical density, the 
canonical free energy and the grand canonical para- 
meter \, or between the canonical energy, the micro- 
canonical entropy, and the canonical parameter 5: 


Ae Ove fi ))s 


where convexity and strictness of the maxima imply 
the derivatives existence. 


kpB= O smc(uc,v |) [17] 


Remark Therefore, in the equivalence between 
canonical and microcanonical ensembles, the cano- 
nical distribution with parameters (3,v) should 
correspond with the microcanonical with para- 
meters (u,v). The grand canonical distribution 


with parameters (3,A) should correspond with the 
canonical with parameters (5, Vgc). 


For more details, the reader is referred to Ruelle 
(1969) and Gallavotti (1999). 


Physical Interpretation of 
Thermodynamic Functions 


The existence of the limits [13] implies several 
properties of interest. The first is the possibility of 
finding the physical meaning of the functions 
PecsfesSme and of the parameters A, 3 

Note first that, for all V the grand canonical average 
(K)5 x is (d/2)8 ! (N) 5 , so that 7"! is proportional to 
the temperature T, = T(3, A) in the grand canonical 
distribution: 3! = kg T(8, A). Proceeding heuristically, 
the physical meaning of p(8, à) and A can be found 
through the following remarks. 

Consider the microcanonical distribution 1/75, and 
denote by f" the integral over (P, Q) extended to the 
domain of the (P,Q) such that H(P, Q) — U and, at 
the same time, q, € dV, where dV is an infinitesimal 
volume surrounding the region Q. Then, by the 
microscopic definition of the pressure p (see the 
introductory section), it is 


_ N * 2 pt dPdQ 

PAV = 4 U,N, Vj] 932m NIIN 
02 + dPdQ 
"zu "rum ls 


where 6 = 6(H(P,Q) — U). The RHS of [18] can be 
compared with 


OvZ(U.N,V)dV — N /dd 
Z(U,N,V) | Z(U,N,V)] NIpdN 
to give 
óv£dV _N pdV -— BpdV 


Z (2/3K) 


because (K)', which denotes the average f" K/ |” 1, 
should be essentially the same as the microcanonical 
average (K) me (i.e., insensitive to the fact that one 
particle is constrained to the volume dV) if N is 
large. In the limit V—oo,V/N=v, the latter 
remark together with the second of [17] yields 


ky OpSme(u, y !) = Bp(u, V), 
bs Gaismu, v) = f [19] 


respectively. Note that p > 0 and it is not increasing 
in v because smc(p) is concave as a function of 
v=p' (in fact, by the remark following [14] 
PSmc(U, p) is convex in p and, in general, if pg(p) is 
convex in p then g(v^!) is always concave in v — p^! ). 
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Hence, dsmc(u,v) — (du + pdv)/T, so that taking 
into account the physical meaning of p, T (as 
pressure and temperature, see the section "Pressure, 
temperature, and kinetic energy"), Sme is, in thermo- 
dynamics, the entropy. Therefore (see the second 
of [16]), -Bf.(B, p) = — Buc + kg Smc(Me, p) becomes 


f: (B, p) = Hc 一 是 Rock p), 
df. = —p dv — Sme dT [20] 


and since z has the interpretation (as mentioned in 
the last section) of average energy in the canonical 
distribution u$ „ it follows that f. has the thermo- 
dynamic interpretation of free energy (once com- 
pared with the definition of free energy, F — U — TS, 
in thermodynamics). 

By [17] and [20], 


A= y (Vgc fel B, Uge)) = ue — Tesme + PU 


and vg. has the meaning of specific volume v. Hence, 
after comparison with the definition of chemical 
potential, AV = U — TS + pV, in thermodynamics, it 
follows that the thermodynamic interpretation of A 
is the chemical potential and (see [16], [17]), the 
grand canonical relation 


Bpgc (B, A) = BAv,.| — Bv. (—Bute + ks smc(uc, v^) 


shows that p4.(8,A) = p, implying that p&4.(8, A) is 
the pressure expressed, however, as a function of 
temperature and chemical potential. 

To go beyond the heuristic derivations above, it 
should be remarked that convexity and the property 
that the maxima in [16], [17] are reached in the 
interior of the intervals of variability of v or u are 
sufficient to turn the above arguments into rigorous 
mathematical deductions: this means that given [19] 
as definitions of p(u,v), 8(u,v), the second of [20] 
follows as well as Pgc(B, 和) = p(y, vi). But the 
values vg and uc in [16] are not necessarily unique: 
convex functions can contain horizontal segments 
and therefore the general conclusion is that the 
maxima may possibly be attained in intervals. 
Hence, instead of a single vg, there might be a 
whole interval [v_, v+], where the rhs of [16] reaches 
the maximum and, instead of a single ttc, there 
might be a whole interval [4..,4,] where the rhs of 
[17] reaches the maximum. 

Convexity implies that the values of A or £ 
for which the maxima in [16] or [17] are attained 
in intervals rather than in single points are rare 
(i.e., at most denumerably many): the interpretation 
is, in such cases, that the thermodynamic functions 
show  discontinuities, and the corresponding 
phenomena are called phase transitions (see the 
next section). 


For more details the reader is referred to Ruelle 
(1969) and Gallavotti (1999). 


Phase Transitions and Boundary 
Conditions 


The analysis in the last two sections of the relations 
between elements of ensembles of distributions 
describing macroscopic equilibrium states not only 
allows us to obtain mechanical models of thermo- 
dynamics but also shows that the models, for a given 
system, coincide at least as Q — oo. Furthermore, the 
equivalence between the thermodynamic functions 
computed via corresponding distributions in differ- 
ent ensembles can be extended to a full equivalence 
of the distributions. 

If the maxima in [16] are attained at single points 
Ug. Or Ue the equivalence should take place in the 
sense that a correspondence between H% ys 15 vs Huy 
can be established so that, given any local obser- 
vable F(P, Q), defined as an observable depending 
on (P, O) only through the p;, q; with q; € A, where 
A CQ is a finite region, has the same average with 
respect to corresponding distributions in the limit 
() — oo. 

The correspondence is established by considering 
(4,8) (B, vgc) 9 (mc, v), where vg. is where the 
maximum in [16] is attained, umc = tte is where the 
maximum in [17] is attained and vg: = v, (cf. also 
[19], [20]). This means that the limits 


lim | F(P,Q)u*(dP dQ) = (P), 
(a — independent), a — gc, c, mc [21] 


coincide if the averages are evaluated by the 
distributions H% y, 15 , s He v. 

Exceptions to [21] are possible: and are certainly 
likely to occur at values of u, v where the maxima in 
[16] or [17] are attained in intervals rather than in 
isolated points; but this does not exhaust, in general, 
the cases in which [21] may not hold. 

However, no case in which [21] fails has to be 
regarded as an exception. It rather signals that an 
interesting and important phenomenon occurs. To 
understand it properly, it is necessary to realize that 
the grand canonical, canonical, and microcanonical 
families of probability distributions are by far not 
the only ensembles of probability distributions 
whose elements can be considered to generate 
models of thermodynamics, that is, which are 
orthodic in the sense of the discussion in the section 
*Equivalence of ensembles." More general families 
of orthodic statistical ensembles of probability 
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distributions can be very easily conceived. -In 
particular: 


Definition Consider the grand canonical, canoni- 
cal, and microcanonical distributions associated 
with an energy function in which the potential 
energy contains, besides the interaction ® between 
particles located inside the container, also the 
interaction energy in out between particles inside 
the container and external particles, identical to the 
ones in the container but not allowed to move and 
fixed in positions such that in every unit cube A 
external to Q there is a finite number of them 
bounded independently of A. Such configurations of 
external particles will be called “boundary condi- 
tions of fixed external particles." 


The thermodynamic limit with such boundary 
conditions is obtained by considering the grand 
canonical, canonical, and microcanonical distribu- 
tions constructed with potential energy function 
® + Pin our in containers Q of increasing size taking 
care that, while the size increases, the fixed particles 
that would become internal to Q are eliminated. The 
argument used in the section “Thermodynamic limit” 
to show that the three models of thermodynamics, 
considered there, did define the same thermodynamic 
functions can be repeated to reach the conclusion that 
also the (infinitely many) *new" models of thermo- 
dynamics in fact give rise to the same thermodynamic 
functions and averages of local observables. Further- 
more, the values of the limits corresponding to [13] 
can be computed using the new partition functions 
and coincide with the ones in [13] (ie. they are 
independent of the boundary conditions). 

However, it may happen, and in general it is 
the case, for many models and for particular values 
of the state parameters, that the limits in [21] do 
not coincide with the analogous limits computed 
in the new ensembles, that is, the averages of 
some local observables are unstable with respect 
to changes of boundary conditions with fixed 
particles. 

There is a very natural interpretation of such 
apparent ambiguity of the various models of 
thermodynamics: namely, at the values of the 
parameters that are selected to describe the macro- 
scopic states under consideration, there may corre- 
spond different equilibrium states with the same 
parameters. When the maximum in [16] is reached 
on an interval of densities, one should not think of 
any failure of the microscopic models for thermo- 
dynamics: rather one has to think that there are 
several states possible with the same 5,A and that 
they can be identified with the probability distribu- 
tions obtained by forming the grand canonical, 


canonical, or microcanonical distributions with 
different kinds of boundary conditions. 

For instance, a boundary condition with high 
density may produce an equilibrium state with 
parameters 3, A which also has high density, i.e., the 
density v;! at the right extreme of the interval in 
which the maximum in [16] is attained, while using a 
low-density boundary condition the limit in [21] may 
describe the averages taken in a state with density v^ 
at the left extreme of the interval or, perhaps, with a 
density intermediate between the two extremes. 
Therefore, the following definition emerges. 


Definition If the grand canonical distributions 
with parameters (3,A) and different choices of 
fixed external particles boundary conditions gene- 
rate for some local observable F average values 
which are different by more than a quantity 6 > 0 
for all large enough volumes Q then one says that 
the. system has a phase transition at (3,A). This 
implies that the limits in [21], when existing, will 
depend on the boundary condition and their values 
will represent averages of the observables in 
“different phases.” A corresponding definition is 
given in the case of the canonical and microcano- 
nical distributions when, given (8,v) or (u,v), the 
limit in [21] depends on the boundary conditions 
for some F. 


Remarks 


1. The idea is that by fixing one of the thermodynamic 
ensembles and by varying the boundary conditions 
one can realize all possible states of equilibrium of 
the system that can exist with the given values of 
the parameters determining the state in the chosen 
ensemble (i.e., (3,), (3,v), or (u,v) in the grand 
canonical, canonical, or microcanonical cases, 
respectively). 

2. The impression that in order to define a phase 
transition the thermodynamic limit is necessary 
is incorrect: the definition does not require 
considering the limit Q— oo. The phenomenon 
that occurs is that by changing boundary condi- 
tions the average of a local observable can 
change at least by amounts independent of the 
system size. Hence, occurrence of a phase 
transition is perfectly observable in finite volume: 
it suffices to check that by changing boundary 
conditions the average of some observable 
changes by an amount whose minimal size is 
volume independent. It is a manifestation of an 
instability of the averages with respect to changes 
in boundary conditions: an instability which does 
not fade away when the boundary recedes to 
infinity, i.e., boundary perturbations produce 
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bulk effects and at a phase transition the averages 
of the local observable, if existing at all, will 
exhibit a nontrivial dependence on the boundary 
conditions. This is also called *long range order." 

3. It is possible to show that when this happens then 
some thermodynamic function whose value is 
independent of the boundary condition (e.g., the 
free energy in the canonical distributions) has 
discontinuous derivatives in terms of the para- 
meters of the ensemble. This is in fact one of the 
frequently-used alternative definitions of phase 
transitions: the latter two natural definitions of 
first-order phase transition are equivalent. How- 
ever, it is very difficult to prove that a given system 
shows a phase transition. For instance, existence of 
a liquid-gas phase transition is still an open 
problem in systems of the type considered until 
the section “Lattice models” below. 

4. A remarkable unification of the theory of the 
equilibrium ensembles emerges: all distributions of 
any ensemble describe equilibrium states. If a 
boundary condition is fixed once and for all, then 
some equilibrium states might fail to be described 
by an element of an ensemble. However, if all 
boundary conditions are allowed then all equili- 
brium states should be realizable in a given 
ensemble by varying the boundary conditions. 

5. The analysis leads us to consider as completely 
equivalent without exceptions grand canonical, 
canonical, or microcanonical ensembles enlarged 
by adding to them the distributions with poten- 
tial energy augmented by the interaction with 
fixed external particles. 

6. The above picture is really proved only for 
special classes of models (typically in models 
in which particles are constrained to occupy 
points of a lattice and in systems with hard core 
interactions, ro > 0 in [14]) but it is believed to 
be correct in general. At least it is consistent 
with all that is known so far in classical 
statistical mechanics. The difficulty is that, 
conceivably, one might even need boundary 
conditions more complicated than the fixed 
particles boundary conditions (e.g., putting 
different particles outside, interacting with 
the system with an arbitrary potential, rather 
than via 9). 


The discussion of the equivalence of the ensembles 
and the question of the importance of boundary 
conditions has already imposed the consideration 
of several limits as Q — oo. Occasionally, it will 
again come up. For conciseness, it is useful to set up 
a formal definition of equilibrium states of an 
infinite-volume system: although infinite volume is 


an idealization void of physical reality, it is never- 
theless useful to define such states because certain 
notions (e.g., that of pure state) can be sharply 
defined, with few words and avoiding wide circum- 
volutions, in terms of them. Therefore, let: 


Definition An infinite-volume state with parameters 
(8, v), (u,v) or (8, À) is a collection of average values 
F— (F) obtained, respectively, as limits of finite- 
volume averages (Fo defined from canonical, micro- 
canonical, or grand canonical distributions in Q, with 
fixed parameters (3, v), (u, v) or (8, A) and with general 
boundary condition of fixed external particles, on 
sequences 1, 一 oo for which such limits exist simul- 
taneously for all local observables F. 


Having set the definition of infinite-volume 
state consider a local observable G(X) and let 
rn G(X) = G(X + €),€ € R4, with X + € denoting the 
configuration X in which all particles are trans- 
lated by £: then an infinite-volume state is called 
a pure state if for any pair of local observables 
F,G it is 

(FreG) — (FG) —0 22] 
which is called a cluster property of the pair F, G. 

The result alluded to in remark (6) is that at least in 
the case of hard-core systems (or of the simple lattice 
systems discussed in the section *Lattice models") the 
infinite-volume equilibrium states in the above sense 
exhaust at least the totality of the infinite-volume 
pure states. Furthermore, the other states that can be 
obtained in the same way are convex combinations of 
the pure states, i.e., they are “statistical mixtures” of 
pure phases. Note that (7:G) cannot be replaced, in 
general, by (G) because not all infinite-volume states 
are necessarily translation invariant and in simple 
cases (e.g., crystals) it is even possible that no 
translation-invariant state is a pure state. 


Remarks 


1. This means that, in the latter models, general- 
izing the boundary conditions, for example 
considering external particles to be not identical 
to the ones inside the system, using periodic or 
partially periodic boundary conditions, or the 
widely used alternative of introducing a small 
auxiliary potential and first taking the infinite- 
volume states in presence of it and then letting 
the potential vanish, does not enlarge further the 
set of states (but may sometimes be useful: an 
example of a study of a phase transition by using 
the latter method of small fields will be given in 
the section “Continuous symmetries: ‘no d —2 
crystal’ theorem"). 
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2. If x is the indicator function of a local event, it 
will make sense to consider the probability of 
occurrence of the event in an infinite-volume state 
defining it as (x). In particular, the probability 
density for finding p particles at x1,x»,..., xy, 
called the p-point correlation function, will thus be 
defined in an infinite-volume state. For instance, 
if the state is obtained as a limit of canonical 
states (-)o. with parameters 3, p, p — N,/V,, in a 
sequence of containers 2,,, then 


N, p 
ns Q 


where the sum is over the ordered p-ples 


(jis---»fp) Thus, the pair correlation p(q,q') 
and its possible cluster property are 
p(q.q') 
deg Jo, exp(- BU(q.qd'.q1.....dN, 2)) dd1 dd, 2 
n (Ny, —2)!ZS (B, p, Vn) 
pla: (a - 8) — e(a)p(q +6) — 0 [23] 
where 


- def —BU(Q) 
Zi = f e ?U(Oqo 
is the *configurational" partition function. 


The reader is referred to Ruelle (1969), Dobrushin 
(1968), Lanford and Ruelle (1969), and Gallavotti 
(1999). 


Virial Theorem and Atomic Dimensions 


For a long time it has been doubted that “just 
changing boundary conditions" could produce such 
dramatic changes as macroscopically different states 
(Le., phase transitions in the sense of the definition in 
the last section). The first evidence that by taking the 
thermodynamic limit very regular analytic functions 
like N log Z*(8, N, V) (as a function of 8, v = V/N) 
could develop, in the limit €) — oo, singularities like 
discontinuous derivatives (corresponding to the max- 
imum in [16] being reached on a plateau and to a 
consequent existence of several pure phases) arose in 
the van der Waals' theory of liquid-gas transition. 
Consider a real gas with N identical particles with 
mass m in a container 2 with volume V. Let the 
force acting on the ith particle be f;; multiplying 


both sides of the equations of motion, mq;=f,, by 
—(1/2)q; and summing over 1, it follows that 


1 N 
-52,", A d; = Y 
=1 


and the quantity C(q) defines the virial of the forces 
in the configuration q. Note that C(q) is not | 
translation invariant because of the presence of the 
forces due to the walls. 

Writing the force f; as a sum of the internal and 
the external forces (due to the walls) the virial C can 
be expressed naturally as sum of the virial Cint of the 
internal forces (translation invariant) and of the 
virial C. of the external forces. 

By dividing both sides of the definition of the 
virial by 7 and integrating over the time interval 
[0,7], one finds in the limit 了 一 十 co, that is, up to 
quantities relatively infinitesimal as 7 — oc, that 


N 


Ex oq (q) 


(K)=1(C) and (Cex) = 3pV 


where p is the pressure and V the volume. Hence 


(K) 23pV + 3(Ci) 
Or 


1 (Gine? 

二 一 — 24 
pute |24] 

Equation [24] is Clausius’ virial theorem: in the case 

of no internal forces, it yields Bpv = 1, the ideal-gas 

equation. 


The internal virial Cin can be written, if f;.i- 


—Og,\P(q; — qj), as 
Cint = E wr di 
El ij 
eU =—》 Op(q EC q;) 


i<j 


which shows that the contribution to the virial by 
the internal repulsive forces is negative while that of 
the attractive forces is positive. The average of Cint 
can be computed by the canonical distribution, 
which is convenient for the purpose. van der Waals 
first used the virial theorem to perform an actual 
computation of the corrections to the perfect-gas 
laws. Simply neglect the third-order term in the 
density and use the approximation - p(q,,q5) = 
p^e-??4:—4:) for the pair correlation function, [23], 
then 


3 2m. 3 
j5P!0)-VO(P)) — Bs 
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where 


- 5 | (e? bi 1)d?q 
and the equation of state [24] becomes 


I(8) 2) g-1 
pedro ] e 


For the purpose of illustration, the calculation of I 


can be performed approximately at “high tempera- 
ture" (8 small) in the case 


-^ din ro 12 ro 6 
v(r) = «(C A 
(the classical Lennard-Jones potential), £,ro > 0. 
The result is 


a b\ 1 1 1 1 
(p+ 5)»= (142) 57 12575 * (aa) 


(p+ ^) (v — b)8 2 1-- O(v-?) [26] 


Or 


which gives the equation of state for Ge < 1. Equation 
[26] can be compared with the well-known empirical 
van der Waals equation of state: 


B(p- 5 )v-b)-1 
or 


(p + An*/V*)(V — nB) = nRT [27] 


where, if Na is Avogadro's number, A=aNi, 
B=bNa,R=kpNa,n=N/Nsg. It shows the possi- 
bility of accessing the microscopic parameters £ and 
ro of the potential p via measurements detecting 
deviations from the Boyle-Mariotte law, Bpv=1, 
of the rarefied gases: e¢=3a/8b=3A/8BNa 
ro = (3b/2n)'/? = (3B/2nNA)!^. 

As a final comment, it is worth stressing that the 
virial theorem gives in principle the exact correc- 
tions to the equation of state, in a rather direct and 
simple form, as time averages of the virial of the 
internal forces. Since the virial of the internal forces 
is easy to calculate from the positions of the 
particles as a function of time, the theorem provides 
a method for computing the equation of state in 


numerical simulations. In fact, this idea has been 
exploited in many numerical experiments, in which 
[24] plays a key role. 

For more details, the reader is referred to Gallavotti 
(1999). 


van der Waals Theory 


Equation [27] is empirically used beyond its validity 
region (small density and small 3) by regarding A, B as 
phenomenological parameters to be experimentally 
determined by measuring them near generic values of 
p, V, T. The measured values of A, B do not “usually 
vary too much" as functions of v, T and, apart from 
this small variability, the predictions of [27] have 
reasonably agreed with experience until, as experi- 
mental precision increased over the years, serious 
inadequacies eventually emerged. 

Certain consequences of [27] are appealing: for 
example, Figure 1 shows that it does not give a p 
monotonic nonincreasing in v if the temperature is 
small enough. A critical temperature can be defined 
as the largest value, T., of the temperature below 
which the graph of p as a function of v is not 
monotonic decreasing; the critical volume V, is the 
value of v at the horizontal inflection point 
occurring for T = Te. 

For T « T; the van der Waals interpretation of the 
equation of state is that the function p(v) may 
describe metastable states while the actual equilibrium 
states would follow an equation with a monotonic 
dependence on v and p(v) becoming horizontal in the 
coexistence region of specific volumes. The precise 
value of p where to draw the plateau (see Figure 1) 
would then be fixed by experiment or theoretically 
predicted via the simple rule that the plateau 
associated with the represented isotherm is drawn at 
a height such that the area of the two cycles in the 
resulting loop are equal. 

This is Maxwell’s rule: obtained by assuming 
that the isotherm curve joining the extreme points of 
the plateau and the plateau itself define a cycle 


v Vg 


Figure 1 The van der Waals equation of state at a temperature 
T < T, where the pressure is not monotonic. The horizontal line 
illustrates the "Maxwell rule." 


V 
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(see Figure 1) representing a sequence of possible 
macroscopic equilibrium states (the ones correspond- 
ing to the plateau) or states with extremely long time 
of stability (“metastable”) represented by the curved 
part. This would be an isothermal Carnot cycle which, 
therefore, could not produce work: since the work 
produced in the cycle (i.e., $ pdv) is the signed area 
enclosed by the cycle the rule just means that the area is 
zero. The argument is doubtful at least because it is not 
clear that the intermediate states with p increasing 
with v could be realized experimentally or could even 
be theoretically possible. 

A striking prediction of [27], taken literally, is 
that the gas undergoes a gas-liquid phase transition 
with a critical point at a temperature Te, volume ve, 
and pressure p, that can be computed via [27] and 
are given by RT, = 8A/27B, V, 2 3B (n— 1). 

At the same time, the above prediction is interesting 
as it shows that there are simple relations between the 
critical parameters and the microscopic inter- 
action constants, i.e., € ^ kg T, and ro œ (V./N PY) ear 
or more precisely ¢ = 81kpT./64, ro = (V./2xNA)!? 
if a classical Lennard-Jones potential (i.e., — 4e 
((ro/|1g1) ^ — (ro/|q|)®); see the last section) is used 
for the interaction potential y. 

However, [27] cannot be accepted acritically not 
only because of the approximations (essentially the 
neglecting of O(v !) in the equation of state), but 
mainly because, as remarked above, for T < Te the 
function p is no longer monotonic in v as it must be; 
see comment following [19]. 

The van der Waals equation, refined and comple- 
mented by Maxwell's rule, predicts the following 
behavior: 


(b—p.)e(v—w), 6-3, T— T. 
(v;—w)«(T.— TP, B=1/2, for T T, [28] 


which are in sharp contrast with the experimental 
data gathered in the twentieth century. For the 
simplest substances, one finds instead 6 = 5, 3 = 1/3. 

Finally, blind faith in the equation of state [27] is 
untenable, last but not least, also because nothing in 
the analysis would change if the space dimension was 
d —2 or d — 1: but for d — 1, it is easily proved that the 
system, if the interaction decays rapidly at infinity, 
does not undergo phase transitions (see next section). 

In fact, it is now understood that van der Waals' 
equation represents rigorously only a limiting situa- 
tion, in which particles have a hard-core interaction 
(or a strongly repulsive one at close distance) and a 
further smooth interaction y with very long range. 
More precisely, suppose that the part of the potential 
outside a hard-core radius ro » 0 is attractive 
(i.e., non-negative) and has the form 4^; (7! |q|) < 0 


and call Po(v) the (8-independent) product of 8 times 
the pressure of the hard-core system without any 
attractive tail (Po(v) is not explicitly known except 
if d— 1, in which case it is Po(v)(v — b) 2 1, b — ro), 
and let 


dece 1(q)ldq 
lqi>ro 
If p(8, v; y) is the pressure when y > 0 then it can be 
proved that 


8p(3,v) = lim Bp(B,v;7) 
An 


-|- Po) [29] 


| Maxwell's rule 


where the subscript means that the graph of p(8, v) 
as a function of v is obtained from the function in 
square bracket by applying to it Maxwell's rule, 
described above in the case of the van der Waals 
equation. Equation [29] reduces exactly to the 
van der Waals equation for d— 1, and for d> 1 
it leads to an equation with identical critical 
behavior (even though Po(v) cannot be explicitly 
computed). 

The reader is referred to Lebowitz and Penrose 
(1979) and Gallavotti (1999) for more details. 


Absence of Phase Transitions: d — 1 


One of the most quoted no-go theorems in statistical 
mechanics is that one-dimensional systems of parti- 
cles interacting via short-range forces do not exhibit 
phase transitions (cf. the next section) unless the 
somewhat unphysical situation of having zero 
absolute temperature is considered. This is particu- 
larly easy to check in the case of “nearest-neighbor 
hard-core interactions.” Let the hard-core size be ro, 
so that the interaction potential y(r) = +00 if r € ro, 
and suppose also that y(r) z 0 if f > 2rg. In this 
case, the thermodynamic functions can be exactly 
computed and checked to be analytic: hence the 
equation of state cannot have any phase transition 
plateau. This is a special case of van Hove's theorem 
establishing smoothness of the equation of state for 
interactions extending beyond the nearest neighbor 
and rapidly decreasing at infinity. 

If the definition of phase transition based on the 
sensitivity of the thermodynamic limit to variations 
of boundary conditions is adopted then a more 
general, conceptually simple, argument can be given 
to show that in one-dimensional systems there 
cannot be any phase transition if the potential 
energy of mutual interaction between a 
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configuration Q of particles to the left of a reference 
particle (located at the origin O, say) and a 
configuration Q' to the right of the particle (with 
QUOUQ' compatible with the hard cores) is 
uniformly bounded below. Then a mathematical 
proof can be devised showing that the influence of 
boundary conditions disappears as the boundaries 
recede to infinity. One also says that no long-range 
order can be established in a one-dimensional case, 
in the sense that one loses any trace of the boundary 
conditions imposed. 

The analysis fails if the space dimension is > 2: in 
this case, even if the interaction is short-ranged, the 
energy of interaction between two regions of space 
separated by a boundary is of the order of the 
boundary area. Hence, one cannot bound above and 
below the probability of any two configurations in 
two half-spaces by the product of the probabilities 
of the two configurations, each computed as if the 
other was not there. This is because such a bound 
would be proportional to the exponential of the 
surface of separation, which tends to oo when the 
surface grows large. This means that we cannot 
consider, at least not in general, the configurations 
in the two half-spaces as independently distributed. 

Analytically, a condition on the potential suffi- 
cient to imply that the energy between a configura- 
tion to the left and one to the right of the origin is 
bounded below, if d — 1, is simply expressed by 


fi rip(r)ldr < 十 ce forr > ro 


Therefore, in order to have phase transitions in 
d — 1, a potential is needed that is “so long range" 
that it has a divergent first moment. It can be 
shown by counterexamples that if the latter condi- 
tion fails there can be phase transitions even in 
d — 1 systems. 

The results just quoted also apply to discrete 
models like lattice gases or lattice spin models that 
will be considered later in the article. 

For more details, we refer the reader to Landau 
and Lifschitz (1967), Dyson (1969), Gallavotti 
(1999), and Gallavotti et al. (2004). 


Continuous Symmetries: *No d —2 
Crystal" Theorem 


A second case in which it is possible to rule out 
existence of phase transitions or at least of certain 
kinds of transitions arises when the system under 
analysis enjoys large symmetry. By symmetry is 
meant a group of transformations acting on the 
configurations and transforming each of them into a 


configuration which, at least for one boundary 
condition (e.g., periodic or open), has the same 
energy. 

A symmetry is said to be “continuous” if the 
group of transformations is a continuous group. For 
instance, continuous systems have translational 
symmetry if considered in a container (2 with 
periodic boundary conditions. Systems with “too 
much symmetry" sometimes cannot show phase 
transitions. For instance, the continuous translation 
symmetry of a gas in a container Q with periodic 
boundary conditions is sufficient to exclude the 
possibility of crystallization in dimension d — 2. 

To discuss this, which is a prototype of a proof 
which can be used to infer absence of many 
transitions in systems with continuous symmetries, 
consider the translational symmetry and a potential 
satisfying, besides the usual [14] and with the 
dot used in [14], the further property that 
ll^ 9; q)| < Blq| (de) with so > 0, for some B 
holds. i ro <|q| € R. This is a very mild extra 
requirement (and it allows for a  hard-core 
interaction). 

Consider an “ideal crystal" on a square lattice 
(for simplicity) of spacing a, exactly fitting in its 
container Q of side L assutued with periodic 
boundary conditions: so that N=(L/a)* is the 
number of particles and a7 is the density, which is 
supposed to be smaller than the close packing 
density if the interaction y has a hard core. The 
probability distribution of the particles is rather 
trivial: 


om? I P(dp(w) E 


the sum running over the permutations m— p(m) of 
the sites m € Q, m € Z4,0 < m; < La”. The density 
at q is 


x —an) = (> su- s) 


and its Fourier transform is proportional to 


ef 1 i 
p(k) = (5 ct) k= Tn, n cz 


J 


p(k) has value 1 for all k of the form K = (2x/a)n 
and (1/N)O(max,-1,»|e^? — 1| ^) otherwise. In 
presence of interaction, it has to be expected that, 
in a crystal state, p(k) has peaks near the values K: 
but the value of p(k) can depend on the boundary 
conditions. 

Since the system is translation invariant a crystal 
state defined as a state with a distribution “close” to ji, 
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i.e., with (q) with peaks at the ideal lattice points 
q-—na, cannot be realized under periodic boundary 
conditions, even when the system state is crystalline. 
To realize such a state, a symmetry-breaking term is 
needed in the interaction. 

This can be done in several ways, for example, by 
changing the boundary condition. Such a choice 
implies a discussion of how much the boundary 
conditions influence the positions of the peaks of 
p(k): for instance, it is not obvious that a boundary 
condition will not generate a state with a period 
different from the one that a priori has been selected 
for disproval (a possibility which would imply a 
reciprocal lattice òf K's different from the one 
considered to begin with). Therefore, here the choice 
will be to imagine that an external weak force with 
potential e W(q) acts forcing a symmetry breaking 
that favors the occupation of regions around the 
points of the ideal lattice (which would mark the 
average positions of the particles in the crystal state 
that is being sought). The proof (Mermin’s theorem) 
that no equilibrium state with particles distribution 
“close” to ji, i.e., with peaks in place of the delta 
functions (see below), is essentially reproduced 
below. 

Take W(q) — $ „aco x(q — na), where x(q) € 0 is 
smooth and zero everywhere except in a small 
vicinity of the lattice points around which it 
decreases to some negative minimum keeping a 
rotation symmetry around them. The potential W is 
invariant under translations by the lattice steps. By 
the choice of the boundary condition and eW, the 
density p-(q) will be periodic with period a so that 
p-(k) will, possibly, not have a vanishing limit as 
N — oo only if k is a reciprocal vector K =(27/a)n 
If the potential is p + eW and if there exists a crystal 
state in which particles have higher probability of 
being near the lattice points ma, it should be 
expected that for small £ > 0 the system will be 
found in a state with Fourier transform of the 
density, p-(k), satisfying, for some vector K Æ 0 in 
the reciprocal lattice, 


lim lim |p.(K)| =r > 0 [30] 
that is, the requirement is that uniformly in E40 
the Fourier transform of the density has a peak at 
some K Æ 0. Note that if k is not in the reciprocal 
lattice p.(k) — 0, being bounded above by 


1 ik;a -2 
六 oO 人 (ms M ) 


because (1/N)p. is periodic and its integral over q is 
equal to 1. Hence, excluding the existence of a 


crystal will be identified with the impossibility of the 
[30]. Other criteria can be imagined, for example, 
considering crystals with a lattice different from 
simple cubic, which lead to the same result by 
following the same technique. Nevertheless, it is not 
mathematically excluded (but unlikely) that, with 
some weaker existence definition, a crystal state 
could be possible even in two dimensions. 

The following inequalities hold under the present 
assumptions on the potential and in the canonical 
distribution with periodic boundary conditions 
and parameters (8,p),p—4a ? in a box Q with side 
multiple of a (so that N — (La !)^) and potential of 
interaction 2 + €W. The further assumption that the 
lattice ma is not a close-packed lattice is (of course) 
necessary when the interaction potential has a hard 
core. Then, for suitable Bo, B, B4, B2 > 0, indepen- 
dent of N, and £ and for |x| < z/a and for all Q 


(if K # 0) 
1 N 
这 
1 

二 六 Ue) tipo 


where the averages are in the canonical distribu- 
tion (3, p) with periodic boundary conditions and a 
symmetry-breaking potential a W(q);^(k) > 0 is an 
(arbitrary) smooth function vanishing for 2|k| > 6 
with 6 < 27/a and Bo depends on y. See Appendix 
3 for a derivation of [31]. 

Multiplying both sides of the first equation in [31] 
by N^'4(x) and summing over x, the crystallinity 
condition in the form [30] implies 


dk 
B > Bra! | ES 
: ics K^B1 + €Bz 


—i(K+K)-q; 


^N ~ p (Pe(K) + p (K + 2x) 
Z Bix? + eB; 


—i(K+K)-q; 


*) < By <o [31] 


For d — 1,2 the integral diverges, as € !? or loge", 


sespéctivalv, implying |p.(K)| —2 r — 0: the criterion 
of crystallinity, [30] cannot be satisfied if d — 1.2. 
The above inequality is an example of a general 
class of inequalities called infrared inequalities stem- 
ming from another inequality called Bogoliubov's 
inequality (see Appendix 3), which lead to the proof 
that certain kinds of ordered phases cannot exist if 
the dimension of the ambient space is d — 2 when a 
finite volume, under suitable boundary conditions 
(e.g., periodic), shows a continuous symmetry. The 
excluded phenomenon is, more precisely, the non- 
existence of equilibrium states exhibiting, in the 
thermodynamic limit, a symmetry lower than 
the continuous symmetry holding in a finite volume. 
In general, existence of thermodynamic equili- 
brium states with symmetry lower than the 
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symmetry enjoyed by the system in finite volume 
and under suitable boundary conditions is called a 
*spontaneous symmetry breaking." It is yet another 
manifestation of instability with respect to changes 
in boundary conditions, hence its occurrence reveals 
a phase transition. There is a large class of systems 
for which an infrared inequality implies absence of 
spontaneous symmetry breaking: in most of the one- 
or two-dimensional systems a continuous symmetry 
cannot be spontaneously broken. 

The limitation to dimension d € 2 is a strong 
limitation to the generality of the applicability of 
infrared theorems to exclude phase transitions. 
More precisely, systems can be divided into classes 
each of which has a "critical dimension" below 
which too much symmetry implies absence of 
phase transitions (or of certain kinds of phase 
transitions). 

It should be stressed that, at the critical dimen- 
sion, the symmetry breaking is usually so weakly 
forbidden that one might need astronomically large 
containers to destroy small effects (due to boundary 
conditions or to very small fields) which break the 
symmetry. For example, in the crystallization just 
discussed, the Fourier transform peaks are only 
bounded by O(1/4/loge-!). Hence, from a practical 
point of view, it might still be possible to have some 
kind of order even in large containers. 

The reader is referred to Mermin (1968), Hohen- 
berg (1969), and Ruelle (1969). 


High Temperature and Small Density 


There is another class of systems in which no phase 
transitions take place. These are the systems with 
stable and tempered interactions y (e.g., those 
satisfying [14]) in the high-temperature and low- 
density region. The property is obtained by showing 
that the equation of state is analytic in the variables 
(8, p) near the origin (0,0). 

A simple algorithm (Mayer’s series) yields the 
coefficients of the virial series 


=p +d al) 


It has the drawback that the kth order coefficient c, (7) 
is expressed as a sum of many terms (a number 
growing more than exponentially fast in the order k) 
and it is not so easy (but possible) to show 
combinatorially that their sum is bounded exponen- 
tially in k if 9 is small enough. A more efficient 
approach leads quickly to the desired solution. 
Denoting $(q,,...,4,) € ici v(d; — 4j, consider 
the (“spatial or configurational”) correlation functions 


Bp(8, p) 


defined, in the grand canonical distribution with 
parameters 5, À (and empty boundary conditions), by 


df 1 ——— 一 Tm 
po(di.--..4,) = Z&« (B, A, i247 
x | e P9 s317 yn) 91 Wn [32] 
^ m! 


This is the probability density for finding particles 
with any momentum in the volume element dq, ---dq,, 
(irrespective of where other particles are), and 
z — eP(/2xmB-1b-2)! accounts for the integration 
over the momenta variables and is called the activity: 
it has the dimension of a density (cf. [23]). 

Assuming that the potential has a hard core (for 
simplicity) of radius R, the interaction energy 
Py (Go,--+54,) of a particle at q; with any number 
of other particles at q5,...,4,, with |q; — q;| > R is 
bounded below by —B for some B > 0 (related but 
not equal to the B in [14]). The functions po will be 
regarded as a sequence of functions “of one, two, 
particle positions": po — (po(q1. ....4,))7—1 vani: 
ing for q; Z Q. Then, one checks that 


po(d1.-...ds) = z6n1xo(di1) + Kpo(qi,-...q4,) [33a] 
with 
Kpo(q1,-..,d,) S e P*n 9» (po(qy,..., qn) Snot 
= dy, ---dy ^T i Beta; - 
T ho NAM us e Beldi Y) 1 
2. an" IH 
X pn(q5...-.Qu:y1.-y.)) [33b] 


where 6, 1,6,.1 are Kronecker deltas and x(q) is the 
indicator function of Q. Equation [33] is called the 
Kirkwood-Salzburg equation for the family of corre- 
lation functions in €. The kernel K of the equations is 
independent of 2, but the domain of integration is Q. 

Calling ag the sequence of functions 
oao(di,...,q,) =9 if n#1 and ao(q) ^zxo(q), a 
recursive expansion arises, namely 


po = za 十 z^ Kao T 2K*aq + 2* Ke ag T [34] 


It gives the correlation functions, provided the series 
converges. The inequality 


p 
qn)| < e228 +DP f LI 一 lldg ) 
def (28B+1 b (3)? [35] 


shows that the series [34], called Mayer’s series, 
converges if |z|<e~(?98+'7(3)%. Convergence is 
uniform (as Q — oo) and (K?’)ao(q),..-,g,,) tends to 
a limit as V — oo at fixed q),...,q,, and the limit is 
simply (K?a)(qi,...,q,), i£ a(q1,...,q,) 20 for m1, 
and a(q,)=1. This is because the kernel K contains 
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the factors (e ^?4:— — 1) which decay rapidly or, if 
y has finite range, will eventually even vanish. It 
is also clear that (K^o)(qi,...,q,) is translation 
invariant. 

Hence, if |z|e2?9*!7(3)? <1, the limits, as Q — oc, 
of the correlation functions exist and can be 
computed by a convergent power series in z; the 
correlation functions will be translation invariant (in 
the thermodynamic limit). 

In particular, the one-point correlation function 
p= plq) is p=z(1 + O(zr(8y)), which, to lowest order 
in z, just shows that activity and density essentially 
coincide when they. are small enough. Furthermore, 
Bpo = (1/V)log Z**(B, A, V) is such that 


1 
20; Bpa = V J po(q) dq 


(from the definition of po in [32]). Therefore, 


Bp(B,2) = lim Tlog Z®(8,A, V) 
z dz’ 

i: 
and, since the density p is analytic in z as well and 
p~z for z small, the grand canonical pressure is 
analytic in the density and Bp = p(1 + O(p7)), at small 
density. In other words, the equation of state is, to 
lowest order, essentially the equation of a perfect gas. 
All quantities that are conceivably of some interest 
turn out to be analytic functions of temperature and 
density. The system is essentially a free gas and it has 
no phase transitions in the sense of a discontinuity or 
of a singularity in the dependence of a thermodynamic 
function in terms of others. Furthermore, the system 
cannot show phase transitions in the sense of sensitive 
dependence on boundary conditions of fixed external 
particles. This also follows, with some extra work, 
from the Kirkwood-Salzburg equations. 

The reader is referred to Ruelle (1969) and 
Gallavotti (1969) for more details. 


Lattice Models 


The problem of proving the existence of phase 
transitions in models of homogeneous gases with 
pair interactions is still open. Therefore, it makes 
sense to study the problem of phase transitions 
in simpler models, tractable to some extent but 
nontrivial, and which are of practical interest in 
their own right. 

The simplest models are the so-called lattice 
models in which particles are constrained to points 
of a lattice: they cannot move in the ordinary sense 
of the word (but, of course, they could jump) and 


therefore their configurations do not contain 
momentum variables. 

The interaction energy is just the potential 
energy, and ensembles are defined as collections of 
probability distributions on the position coordinates 
of the particle configurations. Usually, the potential 
is a pair potential decaying fast at oo and, often, 
with a hard-core forbidding double or higher 
occupancy of the same lattice site. For instance, 
the lattice gas with potential y, in a cubic box 2 
with |Q|= V = L? sites of a square lattice with mesh 
a>0O, is defined by the potential energy attributed 
to the configuration X of occupied distinct sites, 
i.e., subsets X C Q: 


H(X)=- >_, e(x- y) [37] 


(x.y)e X 


where the sum is over pairs of distinct points in X. 
The canonical ensemble and the grand canonical 
ensemble are the collections of distributions, para- 
metrized by (8,p),(p— N/V), or, respectively, by 
(B, A), attributing to X the probability 


e HOD 
pes(X)= 元 NO ON [38a] 
or 
eP\X|e—GH(X) 
Pa(X [38b] 


j- ZE (B, X, Q) 


where the denominators are normalization factors 
that can, respectively, be called, in analogy with the 
theory of continuous systems, canonical and grand 
canonical partition functions; the subscript p stands 
for particles. 

A lattice gas in which in each site there can be at 
most one particle can be regarded as a model for the 
distribution of a family of spins on a lattice. Such 
models are quite common and useful (e.g., they arise 
in studying systems with magnetic properties). 
Simply identify an “occupied” site with a “spin 
up" or + and an “empty” site with a “spin down” 
or — (say). If o= [o.],«o is a spin configuration, the 
energy of the configuration “for potential y and 
magnetic field h” will be 


H(c)-— >》 e(x-y)xe,- b ox [39] 


(x.y)e€ 


with the sum running over pairs (x, y) € Q of distinct 
sites. If p(x — y) = Jxy > 0, the model is called a 
ferromagnetic Ising model. As in the case of 
continuous systems, it will be assumed to have a 
finite range for y: that is, (x) — 0 for |x| > R, for 
some R, unless explicitly stated otherwise. 
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The canonical and grand canonical ensembles in the 
box Q with respective parameters (8,77) or (8, h) will 
be defined as the probability distributions on the spin 
configurations O={0x},<q with $ pen ox — M — mV 
or without constraint on M, respectively; hence, 


exp (-8 2 x» Vx — y)ox0y ) 
Z<(B, M, 0) 
panlo) [40] 
exp( -Bb Z ox — B Dixy) P(X — y)oxey) 
Zs (B, b, Q) 


PB.m (0)— 


where the denominators are normalization factors 
again called, respectively, the canonical and grand 
canonical partition functions. As in the study of the 
previous continuous systems, canonical and grand 
canonical ensembles with “external fixed particle 
configurations" can be defined together with the 
corresponding ensembles with “external fixed spin 
configurations"; the subscript s stands for spins. 

For each configuration X C Q of a lattice gas, let 
{nx} be n, —1 if x € X and n, =0 if x ¢ X. Then the 
transformation c, — 27, — 1 establishes a correspon- 
dence between lattice gas and spin distributions. In 
the correspondence, the potential w(x — y) of the 
lattice gas generates a potential (1/4)y(x — y) for the 
corresponding spin system and the chemical potential 
X for the lattice gas is associated with a magnetic field 
h for the spin system with 5 =(1/2)(A + 3°, 29 v(x)). 

The correspondence between boundary conditions 
is natural: for instance, a boundary condition for the 
lattice gas in which all external sites are occupied 
becomes a boundary condition in which external 
sites contain a spin +. The close relation between 
lattice gas and spin systems permits switching from 
one to the other with little discussion. 

In the case of spin systems, empty boundary 
conditions are often considered (no spins outside Q). 
In lattice gases and spin systems (as well as in 
continuum systems), often periodic and semiperiodic 
boundary conditions are considered (i.e., periodic in 
one or more directions and with empty or fixed 
external particles or spins in the others). 

Thermodynamic limits for the partition functions 


—Bf(B,v) = = m aloes N,Q) 
Bp(B, A) = lim 1 log Z®(p, A, Q) 
()—0o > [41] 

—Bg(B, m) 一 lim y o8 Z;(B. M, 2) 


M/V —m 


Bf(B, b) = lim | = log Z8°(B, A, €?) 


can be shown to exist by a method similar to the 
one discussed in Appendix 2. They have convexity 
and continuity properties as in the cases of the 
continuum systems. In the case of a lattice gas, the 
f, p functions are still interpreted as free energy 
and pressure, respectively. In the case of spin, f (8, P) 
has the interpretation of magnetic free energy, 
while g(8,m) does not have a special name in the 
thermodynamics of magnetic systems. As in the 
continuum systems, it is occasionally useful to define 
infinite-volume equilibrium states: 


Definition An infinite-volume state with para- 
meters (3,4) or (8,7) is a collection of average 
values F — (F) obtained, respectively, as limits of 
finite-volume averages (F), defined from canonical 
or grand canonical distributions in Q, with fixed 
parameters (8, h) or (8, m), or (u,v) and with general 
boundary condition of fixed external spins or empty 
sites, on sequences 2, — oo for which such limits 
exist simultaneously for all local observables F. 


This is taken verbatim from the definition in the 
section “Phase transitions and boundary condi- 
tions." In this way, it makes sense to define the 
spin correlation functions for X —(6,,...,6,) as 
(ax) if ox= I] OE. For instance, we shall call 
p(61, 54) det (ag, oz ) and a pure phase can be defined 
as an infinite-volume state such that 


(exo v.g) — tex) (ovg) Fw 0 [42] 
Again, for more details, we refer the reader to Ruelle 
(1969) and Gallavotti (1969). 


Thermodynamic Limits and Inequalities 


An interesting property of lattice systems is that it is 
possible to study delicate questions like the existence 
of infinite-volume states in some (moderate) generality. 
A typical tool is the use of inequalities. As the simplest 
example of a vast class of inequalities, consider the 
ferromagnetic Ising model with some finite (but 
arbitrary) range interaction Jxy > 0 in a field hy > 0: 
J,b may even be not translationally invariant. Then 
the average of ox =0x, 0x, =: Ors X —(x1,..., Xn), 
in a state with “empty boundary conditions” (i.e., no 
external spins) satisfies the inequalities 


(ox), Oy, (ox), Axy(ox) 2 0 X=(x1,.--, Xn) 


More generally, let H(o) in [39] be replaced by 
H(o) = —»yJxox with Jx > 0 and X can be any 
finite set; then, if Y —(y;,...,94), X — (x1, ...,X), 
the following Griffitbs inequalities hold: 


(ox) > 0,. Oy (ex) = (exov) — (ox)(oY) 20 [43] 
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The inequalities can be used to check, in ferromag- 
netic Ising models, [39], existence of infinite-volume 
states (cf. the sections “Phase transitions and boundary 
conditions" and *Lattice models") obtained by fixing 
the boundary condition B to be either “all external 
spins +” or “all external sites empty.” If (F)& o 
denotes the grand canonical average with boundary 
condition B and any fixed 3,4 > 0, this means that 
for all local observables F(o'4) (i.e., for all F depending 
on the spin configuration in any fixed region A) all the 
following limits exist: 


dim (F)go = (Ps [44] 
zQ 

The reason is that the inequalities [43] imply that all 
averages (0x) are monotonic in Q for all fixed 
X CQ: so the limit [44] exists for F(0) = ox. Hence, 
it exists for all Fs depending only on finitely many 
spins, because any local function F “measurable in A” 
can be expressed (uniquely) as a linear combination 
of functions ox with X C A. 

Monotonicity with empty boundary conditions is 
seen by considering the sites outside Q and in a 
region Q with side one unit larger than that of Q 
and imagining that the couplings Jx with X C € but 
X ¢ Q vanish. Then, (ox)o > (ox), because (ox) 
is an average computed with a distribution corre- 
sponding to an energy with the couplings /x with 
X ¢ €, but X c €, changed from 0 to Jx > 0. 

Likewise, if the boundary condition is +, then 
enlarging the box from Q to Q’ corresponds to 
decreasing an external field 5 acting on the external 
spins from 十 co (which would force all external spins to 
be +) to a finite value h > 0: so, increasing the box Q 
causes (0x), o to decrease. Therefore, as €? increases, 
Ising ferromagnets spin correlations increase if the 
boundary condition is empty and decrease if it is +. 

The inequalities can be used in similar ways to prove 
that the infinite-volume states obtained from 十 or 
empty boundary conditions are translation invariant; 
and that in zero external field, b — 0, the + and 一 
boundary conditions generate pure states if the interac- 
tion potential is only a pair ferromagnetic interaction. 

There are many other important inequalities 
which can be used to prove several existence 
theorems along very simple paths. Unfortunately, 
their use is mostly restricted to lattice systems and 
requires very special assumptions on the energy 
(e.g., ferromagnetic interactions in the above exam- 
ple). The quoted examples were among the first 
discovered and provide a way to exhibit nontrivial 
thermodynamic limits and pure states. 

For more details, see Ruelle (1969), Lebowitz 
(1974), Gallavotti (1999), Lieb and Thirring (2001), 
and Lieb (2002). 


Symmetry-Breaking Phase Transitions 


The simplest phase transitions (see the section 
*Phase transitions and boundary conditions") are 
symmetry-breaking transitions in lattice systems: 
they take place when the energy of the system in a 
container Q and with some special boundary 
condition (e.g., periodic, antiperiodic, or empty) is 
invariant with respect to the action of a group 9 on 
phase space. This means that on the points x of 
phase space acts a group of transformations G so 
that with each y € is associated a map x — xy 
which transforms x into xy respecting the composi- 
tion law in G, that is, (xy)y' =x(yy’). If F is an 
observable, the action of the group on phase space 
induces an action on the observable F changing F(x) 
. def A 

into E,(x) = F(xy~). 

A symmetry-breaking transition occurs when, by 
fixing suitable boundary conditions and taking the 
thermodynamic limit, a state F — (F) is obtained in 
which some local observable shows a nonsymmetric 
average (F) Æ (F,) for some y. 

An example is provided by the “nearest-neighbor 
ferromagnetic Ising model" on a d-dimensional lattice 
with energy function given by [39] with b=0 and 
(x — y) z 0 unless |x — y| 2 1, i.e., unless x, y are 
nearest neighbors, in which case (x —y)=] > 0. 
With periodic or empty boundary conditions, it 
exhibits a discrete “up-down” symmetry O ——9. 

Instability with respect to boundary conditions 
can be revealed by considering the two boundary 
conditions, denoted 十 or —, in which the lattice 
sites outside the container Q are either occupied by 
spins 十 or by spins —. Consider also, for later 
reference, (1) the boundary conditions in which 
the boundary spins in the upper half of the 
boundary are 十 and the ones in the lower 
part are —: call this the 士 -boundary condition 
(see Figure 2); or (2) the boundary conditions in 


Figure 2 The dashed line is the boundary of €?; the outer spins 
correspond to the 3- boundary condition. The points A, B are 
points where an open "line" à ends. 
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which some of the opposite sides of Q are 
identified while + or — conditions are assigned on 
the remaining sides: call these “cylindrical or 
semiperiodic boundary conditions." 

A new description of the spin configurations is 
useful: given o, draw a unit segment perpendicular 
to the center of each bond b having opposite spins at 
its extremes. An example of this construction is 
provided by Figure 2 for the boundary condition +. 

The set of segments can be grouped into lines 
separating regions where the spins are positive from 
regions where they are negative. If the boundary 
condition is 4- or —, the lines form *closed polygons", 
whereas, if the condition is +, there is also a single 
polygon A; which is not closed (as in Figure 2). If the 
boundary condition is periodic or cylindrical, all 
polygons are closed but some may “go around" Q. 
The polygons are also called *contours" and the length 
of a polygon y will be denoted |y]. 

The correspondence (71,72, -- -Yny A1) — O, for 
the boundary condition + or, for the boundary 
condition + (or —), € —9 (Y1, ..-,%n) is One-to-one 
and, if h — 0, the energy Ho(o) of a configuration is 
higher than —/x(number of bonds in Q) by an 
amount 2J(|X1| + Y; lil) or, respectively, 2J Y^; [il 
The grand canonical probability of each spin 
configuration is therefore proportional, if b=0, 
respectively, to 


e 2I or e728 5 hil [45] 


and the *up-down" symmetry is clearly reflected 
by [45]. 

The average (ox)o; of c, with + boundary 
conditions is given by (ox)o , —1 — 2Po,+(—), where 
Po .(—) is the probability that the spin o, is ^1. If the 
site x is occupied by a negative spin then the point x is 
inside some contour ^ associated with the spin 
configuration & under consideration. Hence, if p(y) 
is the probability that a given contour belongs to 
the set of contours describing a configuration ø, it 
is Po.(—) € $5. p(y) where yox means that y 
“surrounds” x. 

If T —(y1,...,»4) is a spin configuration and if 
the symbol T compy means that the contour ^ is 
“disjoint” from 7,..., Ya (ie (y UT] is a new spin 
configuration), then 


y» e wi Payer N'! 


P $5. e 29 2 ater hl 


—2] AT ly’! 
2h Erant ever 
2 e 2A 2 se rl 
< e 2i 46 


bh 


because the last ratio in [46] does not exceed 1. 
Note that there are >3? different shapes of y with 
perimeter p and at most f? congruent »'s containing 
x; therefore, the probability that the spin at x is — 
when the boundary condition is 4- satisfies the 
inequality 


Pn«(-) € 2 pae sg 
p=4 = 


-This probability can be made arbitrarily small so 
that \ox)o is estimated by a quantity which is as 
close to 1 as desired provided is large enough and 
the closeness of (ox), to 1 is estimated by a 
quantity which is both x and €? independent. 

A similar argument for the ( — )-boundary condition, 
or the remark that for h=0 it is (ox)9_ 三 一 (ax)n 4 
leads to conclude that, at large Ø, (ox)o 天 (ax)o + 
and the difference between the two quantities 
is positive uniformly in Q. This is the proof 
(Peierl? theorem) of the fact that there is, if 5 is 
large, a strong instability, of the magnetization with 
respect to the boundary conditions, i.e., the nearest- 
neighbor Ising model in dimension 2 (or greater, by an 
identical argument) has a phase transition. If the 
dimension is 1, the argument clearly fails and no phase 
transition occurs (see the section *Absence of phase 
transitions: d — 1"). 

For more details, see Gallavotti (1999). 


Finite-Volume Effects 


The description in the last section of the phase 
transition in the nearest-neighbor Ising model can be 
made more precise both from physical and mathe- 
matical points of view giving insights into the nature 
of the phase transitions. Assume that the boundary 
condition is the (+)-boundary condition and 
describe a spin configuration & by means of the 
associated closed disjoint polygons (^1,...,7,). 
Attribute to 9 —(91,...,54) a probability propor- 
tional to [45]. Then the following Minlos—Sinai’s 
theorem holds: 


Theorem If 8 is large enough there exist C > 0, 
p(y) 2 0 with p(y) < e??^! and such that a spin 
configuration o randomly chosen out of the grand 
canonical distribution with + boundary conditions 
and h=0 will contain, with probability approaching 
1 as Q — oo, a number K(4(0) of contours con- 
gruent to y such that 


IK) (0) — p(y NN CVR eA 147 


and this relation holds simultaneously for all ^y's. 
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Thus, there are very few contours (and the larger 
they are the smaller is, in absolute and relative 
value, their number): a typical spin configuration in 
the grand canonical ensemble with (+)-boundary 
conditions is such that the large majority of the spins 
is “positive” and, in the “sea” of positive spins, there 
are a few negative spins distributed in small and 
rare regions (their number, however, is still of order 
of |€)]). 

Another consequence of the analysis in the last 
section concerns the the approximate equation of 
state near the phase transition region at low 
temperatures and finite 2. If O is finite, the graph 
of h versus mo(3,h) will have a rather different 
behavior depending on the possible boundary con- 
ditions. For example, if the boundary condition is 
(十 ) or (—), one gets, respectively, the results 
depicted in Figure 3a and 3b, where m*(8) denotes 

dap a er cnl 
the spontaneous magnetization (i.e., m*(8) = 
lim, ,o« limo... ma(B, b)). 

With periodic or empty boundary conditions, the 
diagram changes as in Figure 4. The thermody- 
namic limit m(3,b) = limo ,x 7o(0, b) exists for all 
b X 0 and the resulting graph is in Figure 4b, 
which shows that at 5 —0 the limit is discontin- 
uous. It can be proved, if 53 is large enough, that 
oo > lim, ,or Oym(B, b) — x(8) > 0 (i.e., the angle 
between the vertical part of the graph and the rest 
is sharp). 

Furthermore, it can be proved that m((G,h) is 
analytic in b for b Z0. If 8 is small enough, 


M(B, h) 


mm (8) 


-O(qgpr ^?) O(Q| 


(a) 


m* ( i3) 
-Oar 7) 


olor 


(a) 


analyticity holds at all 4. For 8 large, the function 
f(8,h) has an essential singularity at 5h — 0: a result 
that can be interpreted as excluding a naive theory 
of metastability as a description of states governed 
by an equation of state obtained from an analytic 
continuation to negative values of þh of f(8,h). 

The above considerations and results further 
clarify the meaning of a phase transition for a 
finite system. For more details, we refer the 
reader to Gallavotti (1999) and Friedli and Pfister 
(2004). 


Beyond Low Temperatures 
(Ferromagnetic Ising Model) 


A limitation of the results discussed above is the 
condition of low temperature (“8 large enough"). 
A natural problem is to go beyond the low- 
temperature region and to describe fully. the phe- 
nomena in the region where boundary condition 
instability takes place and first develops. A number 
of interesting partial results are known, which 
considerably improve the picture emerging from 
the previous analysis. A striking list, but far from 
exhaustive, of such results follows and focuses on 
the properties of ferromagnetic Ising spin systems. 
The reason for restricting to such cases is that they 
are simple enough to allow a rather fine analysis, 
which sheds considerable light on the structure of 
statistical mechanics suggesting precise formulation 


IN X B, h) 


-O(|A[ 2) O(A[ 5 


-m* (8) 


(b) 


m(8, A) 


(b) 


Figure4 (a) The hvs mo(8, h) graph for periodic or empty boundary conditions. (b) The discontinuity (at h = 0) of the thermodynamic limit. 
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of the problems that it would be desirable to 
understand in more general systems. 


1. Let z S e^? and consider that the product of zV 


the unit circle) in the z-plane. Then, if J 40, 
they lie in a closed set N!, Q-independent and 
contained in a neighborhood of N of width 


(V is the number of sites |Q| of Q) times the 
partition function with periodic or perfect-wall 
boundary conditions and with  finite-range 
ferromagnetic interaction, not necessarily nearest- 
neighbor; a polynomial in z (of degree 2V) 
is thus obtained. Its zeros lie on the unit 
circle |z| 2 1: this is Lee-Yang's theorem. It 
implies that the only singularities of f(G,4) in 
the region 0< 8 « oo, —oo <h < +0 can be 
found at 5 — 0. 

A singularity can appear only if the point z — 1 
Is an accumulation point of the limiting distribu- 
tion (as Q — oc) of the zeros on the unit circle: if 
the zeros are z1,...,z5y then 


y log z' Z(B, b, €), periodic) 
1 32V 
35]. 二 =Z 
28 + Bh + y logis zi) 


and if 
V^! x (number of zeros of the form 


j dpa(0) 
YT. i a c ee 
z ce ,9€ 8, € 0-- d£) — = 


it is 
Jf (B, b) = 2.8 l "| ) do3(0) [48 
Bf (B, b) = B og(z — e") dpa(0) [48] 


The existence of the measure dpa(0) follows from 
the existence of the thermodynamic limit: but 
dpg(@) is not necessarily d6-continuous, i.e., not 
necessarily proportional to dé. 

. It can be shown that, with not necessarily a 
nearest-neighbor interaction, the zeros of the 
partition function do not move too much under 
small perturbations of the potential even if one 
perturbs the energy (at perfect-wall or periodic 
boundary conditions) into 


Ho (o) = Holo) + (6Ho)(c) 
(6Ho)(o) = X JX) ox [49] 


ACN 


where J'(X) is very general and defined on 
subsets X = (x1,...,x4) C Q such that the quan- 
tity |{J"||=sup,cza > vex (X)| is small enough. 
More precisely, with a ferromagnetic pair 
potential / fixed, suppose that one knows that, 
when /' — 0, the partition function zeros in the 
variable z=e lie in a certain closed set N (of 


shrinking to 0 when ||//|| ^ 0. This allows to 
establish various relations between analyticity 
properties and boundary condition instability 
as described in (3) below. 


. In the ferromagnetic Ising model, with not necessa- 


rily a nearest-neighbor interaction, one says that 
there is a gap around 0 if dp5(0) — 0 near 0 — 0. It 


- can be shown that if 8 is small enough there is a gap 


for all h of width uniform in b. 


. Another question is whether the boundary 


condition instability is always revealed by the 
one-spin correlation function (i.e., by the magne- 
tization) or whether it might be shown only 
by some correlation functions of higher order. It 
can be proved that no boundary condition 
instability occurs for h Æ 0; at h=0 it is possible 
only if 


Jim m(8,b) # lim m(4,b) [50 


. A consequence of the Griffiths’ inequalities 


(cf. the section “Thermodynamic limits and 
inequalities”) is that if [50] is true for a given 
Bo then it is true for all G > 8o. Therefore, item 
(4) leads to a natural definition of the critical 
temperature T. as the least upper bound of the 
T's such that [50] holds (kg T = 37). 


. If d=2 the free energy of the nearest-neighbor 


ferromagnetic Ising model has a singularity 
at e and the value of 5, is known exactly 
from the exact solutions of the model: 
m(8,0*) € m*(8) = (1 — sinh 25])!/3. The loca- 
tion and nature of the singularities of f(8,0) as a 
function of 8 remains an open question for d — 3. 
In particular, the question whether there is a 
singularity of f(8,0) at 8 — 3. is open. 


. For 8 < Be there is instability with respect to 


boundary conditions (see (6) above) and a 
natural question is: how many "pure" phases 
can exist in the ferromagnetic Ising model? 
(cf. the section “Phase transitions and boundary 
conditions,” eqn {[22]). Intuition suggests 
that there should be only two phases: the 
positively magnetized and the negatively 
magnetized ones. 

One has to distinguish between translation- 
invariant pure phases and non-translation-invariant 
ones. It can be proved that, in the case of the 
two-dimensional nearest-neighbor ferromagnetic 
Ising models, all infinite-volume states (cf. the 
section *Lattice models") are translationally invar- 
iant. Furthermore, they can be obtained by 
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considering just the two boundary conditions 十 
and —: the latter states are also pure states for 
models with non-nearest-neighbor ferromagnetic 
interaction. The solution of this problem has led to 
the introduction of many new ideas and techniques 
in statistical mechanics and probability theory. 

8. In any dimension d 72, for 5 large enough, it can 
be proved that the nearest-neighbor Ising model 
has only two translation-invariant phases. If the 
dimension is 23 and ĝ is large, the + and — 
phases exhaust the set of translation-invariant 
pure phases but there exist non-translation- 
invariant phases: For 6 close to B., however, the 
question is much more difficult. 


For more details, see Onsager (1944), Lee and 
Yang (1952), Ruelle (1971), Sinai (1991), Gallavotti 
(1999), Aizenman (1980), Higuchi (1981), and 
Friedli and Pfister (2004). 


Geometry of Phase Coexistence 


Intuition about the phenomena connected with the 
classical phase transitions is usually based on the 
properties of the liquid-gas phase transition; this 
transition is usually experimentally investigated in 
situations in which the total number of particles is 
fixed (canonical ensemble) and in presence of an 
external field (gravity). 

The importance of such experimental conditions 
is obvious; the external field produces a nontransla- 
tionally invariant situation and the corresponding 
separation of the two phases. The fact that the 
number of particles is fixed determines, on the other 
hand, the fraction of volume occupied by each of the 
two phases. 

Once more, consider the nearest-neighbor ferro- 
magnetic Ising model: the results available for it can 
be used to obtain a clear picture of the solution to 
problems that one would like to solve but which in 
most other models are intractable with present-day 
techniques. 

It will be convenient to discuss phase coexistence in 
the canonical ensemble distributions on configurations 
of fixed total magnetization M =mV (see the section 
“Lattice models"; [40]). Let 9 be large enough to be in 
the two-phase region and, for a fixed o € (0, 1), let 


m = om'(B) (1 — a) (-m'(8)) 
= (1 — 2a) m* (8) [51] 


that is, m is in the vertical part of the diagram 
m — m(B,b) at 8 fixed (see Figure 4). 

Fixing m as in [51] does not yet determine the 
separation of the phases in two different regions; for 
this effect, it will be necessary to introduce some 


external cause favoring the occupation of a part of 
the volume by a single phase. Such an asymmetry 
can be obtained in at least two ways: through a 
weak uniform external field (in complete analogy with 
the gravitational field in the liquid-vapor transition) or 
through an asymmetric field acting only on boundary 
spins. The latter should have the same qualitative 
effect as the' former, because in a phase transition 
region a boundary perturbation produces volume 
effects (see sections "Phase transitions and inequal- 
ities” and “Symmetry-breaking phase transitions"). 
From a mathematical point of view, it is simpler to 
use a boundary asymmetry to produce phase separa- 
tions and the simplest geometry is obtained by 
considering +-cylindrical or ++-cylindrical boundary 
conditions: this means ++ or + boundary conditions 
periodic in one direction (e.g., in Figure 2 imagine the 
right and left boundary identified after removing the 
boundary spins on them). 

Spins adjacent to the bases of Q act as symmetry- 
breaking external fields. The ++-cylindrical bound- 
ary condition should favor the formation inside Q 
of the positively magnetized phase; therefore, it 
will be natural to consider, in the canonical 
distribution, this boundary condition only when 
the total magnetization is fixed to be the sponta- 
neous magnetization m* (8). 

On the other hand, the +-boundary condition 
favors the separation of phases (positively magnetized 
phase near the top of 2 and negatively magnetized 
phase near the bottom). Therefore, it will be natural 
to consider the latter boundary condition in the 
case of a canonical distribution with magnetization 
m= (1 — 2a)m*(3) with 0 < a < 1 ([51]). In the latter 
case, the positive phase can be expected to adhere to 
the top of 2 and to extend, in some sense to be 
discussed, up to a distance O(L) from it; and then to 
change into the negatively magnetized pure phase. 

To make the phenomenological description 
precise, consider the spin configurations o through 
the associated sets of disjoint polygons (cf. the 
section “Symmetry-breaking phase transitions”). Fix 
the boundary conditions to be ++ or +-cylindrical 
boundary conditions and note that polygons asso- 
ciated with a spin configuration o are all closed and 
of two types: the ones of the first type, denoted 
^, -- - Yn, are polygons which do not encircle 9; the 
second type of polygons, denoted by the symbols Aa, 
are the ones which wind up, at least once, around Q. 

So, a spin configuration o will be described by a set 
of polygons; the statistical weight of a configuration 
E ess cu ors M sx « sp] BS (cf. [45]): 


ED (X bi, D) [52] 
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The reason why the contours A that go around 
the cylinder €) are denoted by A (rather than by ^) is 
that they *look like" open contours (see the section 
“Symmetry-breaking phase transitions") if one forgets 
that the opposite sides of €) have to be identified. In the 
case of the +-boundary conditions then the number of 
polygons of A-type must be odd (hence #0), while for 
the ++-boundary condition the number of A-type 
polygons must be even (hence it could be 0). 

For more details, the reader is referred to Sinai 
(1991) and Gallavotti (1999). 


Separation and Coexistence of Phases 


In the context of the geometric description of 
the spin configuration in the last section, consider 
the canonical distributions with ++-cylindrical or the 
+-cylindrical boundary conditions and zero field: they 
will be denoted briefly as jus, ++, H8, +, respectively. 
The following theorem (Minlos—Sinai’s theorem) 
provided the foundations of the microscopic theory 
of coexistence: it is formulated in dimension d=2 
but, modulo obvious changes, it holds for d > 2. 


Theorem For 0<a<1 fixed, let m-—(1 — 2a) 
m'(8); then for B large enough a spin configuration 
O= (y1,..., Yns M5 92541) randomly chosen with 
the distribution i5, : enjoys the properties (1)-(iv) below 
with a ug, «-probability approaching 1 as €) — oo: 


(i) o contains only one contour of M-type and 
IIA] — (1 + &(8))L| < o(L) [53] 


where (B) » 0 is a suitable (a-independent) 
function of 3 tending to zero exponentially fast 
as B — oo. 

(ii) If 01,9, denote respectively, the regions above 
and below à, and |Q| =V,|Q*|,|Q-| are, 
respectively, the volumes of Q, Q*, Q` then 


33] — e V| < «(8) V?’ 

Q7] — (1 — a)V | < «(8) VA [54] 
where (3) ;~ exponentially fast, the expo- 
nent 3/4, here and below, is not optimal. 

(ii) If Mj = Prent ax and My = X xen; Ox, then 
|M} — am' (B) V| < «(8) V?” 
M; — (1— o) m* (B8) V| < «(8) V?” [55] 
(iv) If K\(o) denotes the number of contours con- 


gruent to a given y and lying in Q then, 
simultaneously for all the shapes of y: 


|IKA(c) -p(y)a V| < Cey, C>0  [56| 


where p(y) € e?! is the same quantity as 
already mentioned in the text of the theorem of 
“Finite-volume effects”. A similar result holds for 
the contours below A (cf. the comments on |47]). 


The above theorem not only provides a detailed and 
rather satisfactory description of the phase separation 
phenomenon, but it also furnishes a precise micro- 
scopic definition of the line of separation between the 
two phases, which should be naturally identified with 
the (random) line A. 

Á similar result holds in the canonical distribution 
ls, ++, m(3) Where (i) is replaced by: no A-type- 
polygon is present, while (ii), (iii) become super- 
fluous, and (iv) is modified. in the obvious way. In 
other words, a typical configuration for the distribu- 
tion the jig, ++. ,"(3 has the same appearance as a 
typical configuration of the corresponding grand 
canonical ensemble with (+)-boundary condition 
(whose properties are described by the theorem 
given in the section “Beyond low temperatures 
(ferromagnetic Ising model"). 

For more details, see Sinai (1991) and Gallavotti 
(1999). 


Phase Separation Line and Surface 
Tension 


Continuing to refer to the nearest-neighbor Ising 
ferromagnet, the theorem of the last section means 
that, if 3 is large enough, then the microscopic line A, 
separating the two phases, is almost straight (since 
e(8) is small). The deviations of 和 from a straight line 
are more conveniently studied in the grand canonical 
distributions yz, with boundary condition set to +1 in 
the upper half of ƏN, vertical sites included, and 
to —1 in the lower half: this is illustrated in Figure 2 
(see the section “Symmetry-breaking phase transi- 
tions"). The results can be converted into very 
similar results for grand canonical distributions with 
+-cylindrical boundary conditions of the last section. 

Define A to be rigid if the probability that 入 passes 
through the center of the box © (i.e., 0) does not 
tend to 0 as Q — oo; otherwise, it is mot rigid. 

The notion of rigidity distinguishes between the 
possibilities for the line 入 to be "straight." The 
“excess” length ¢(3)L (see [53]) can be obtained in 
two ways: either the line A is essentially straight (in 
the geometric sense) with a few “bumps” distributed 
with a density of order s(B) or, otherwise, it is only 
locally straight and with an important part of the 
excess length being gained through a small bending 
on a large length scale. In three dimensions a similar 
phenomenon is possible. Rigidity of A, or its failure, 
can in principle be investigated by optical means; 
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there can be interference of coherent light scattered 
by macroscopically separated surface elements of A 
only if À is rigid in the above sense. 

It has been rigorously proved that, the line A is not 
rigid in dimension 2. And, at least at low tempera- 
ture, the fluctuation of the middle point is of the 
order O(\VL). In dimension 3 however, it has been 
shown that the surface A is rigid at low enough 
temperature. 

A deeper analysis is needed to study the shape of 
the separation surface under other conditions, for 
example, with 十 boundary conditions in a canoni- 
cal distribution with magnetization intermediate 
between +m*(8). It: involves, as a prerequisite, the 
definition and many properties of the surface 
tension between the two phases. Here only 
the definition of surface tension in the case of 
+-boundary conditions in the two-dimensional case 
will be mentioned. If Z**(Q,74*(8)) and Z* (Q, m) 
are, respectively, the canonical partition functions 
for the ++- and +-cylindrical boundary conditions 
the tension 7(5) is defined as 


Br(B) = — lim : 


Z' (Q,m) 
=æ L log Z 


**(Q,m*(B)) 


The limit can be shown to be a-independent for 8 
large enough: the definition and its justification is 
based on the microscopic geometric description in 
the section “Geometry of phase co-existence.” The 
definition can be naturally extended to higher 
dimension (and to more general non-nearest-neighbor 
models). If d=2, the tension 7 can be exactly 
computed at all temperatures below criticality and 
is 9r(8) — 28] + logtanh 8J. 

More remarkably, the definition can be extended to 
define the surface tension 7(8, rt) in the “direction n,” 
that is, when the boundary conditions are such 
that the line of separation is in the average 
orthogonal to the unit vector n. In this way, if 
d — 2. and o € (0,1) is fixed, it can be proved that 
at low enough temperature the canonical distribu- 
tion with 十 boundary conditions and intermediate 
magnetization 7:—(1-— 2a)m'(8) has typical 
configurations containing a spin — region of area 
~avV; furthermore, if the container is rescaled to 
size L— 1, the region will have a limiting shape 
filling an area a bounded by a smooth curve 
whose form is determined by the classical macro- 
scopic Wulff's theory of the shape of crystals in 
terms of the surface tension 7(n). 

An interesting question remains open in the three- 
dimensional case: it is conceivable that the surface, 
although rigid at low temperature, might become 
“loose” at a temperature T, smaller than the critical 


temperature T, (the latter being defined as the 
highest temperature below which there are at least 
two pure phases). The temperature Te, whose 
existence is rather well established in numerical 
experiments, would be called the “roughening 
transition" temperature. The rigidity of A is con- 
nected with the existence of translationally non- 
invariant equilibrium states. The latter exist in 
dimension d — 3, but not in dimension d —2, where 
the discussed nonrigidity of A, established all the 
way to Te, provides the intuitive reason for the 
absence of non-translation-invariant states. It has 
been shown that in d —3 the roughening tempera- 
ture T.(8) necessarily cannot be smaller than the 
critical temperature of the two-dimensional Ising 
model with the same coupling. 

Note that existence of translationally noninvar- 
iant equilibrium states is not necessary for the 
description of coexistence phenomena. The theory 
of the nearest-neighbor two-dimensional Ising model 
is a clear proof of this statement. 

The reader is referred to Onsager (1944), van 
Beyeren (1975), Sinai (1991), Miracle-Solé (1995), 
Pfister and Velenik (1999), and Gallavotti (1999) for 
more details. 


Critical Points 


Correlation functions for a system with short-range 
interactions and in an equilibrium state (which is 
a pure phase) have cluster properties (see [22]): 
their physical meaning is that in a pure phase there 
is independence between fluctuations occurring in 
widely separated regions. The simplest cluster 
property concerns the “pair correlation function,” 
that is, the probability density p(q,,q) of finding 
particles at points q,,q) independently of where 
the other particles may happen to be (see [23]). 
In the case of spin systems, the pair correlation 
P(91592) — (04,04,) will be considered. The pair 
correlation of a translation-invariant equilibrium 
state has a cluster property ([22], [42]), if 


lp(d1.42) — l —— 0 [57] 
lq zq | >x 


where p is the probability density for finding a 
particle at q (i.e., the physical density of the state) or 
p — (c4) is the average of the value of the spin at q 
(i.e., the magnetization of the state). 

A general definition of critical point is a point c in 
the space of the parameters characterizing equili- 
brium states, for example, 3, in grand canonical 
distributions, 3, v in canonical distributions, or 8, 5 
in the case of lattice spin systems in a grand canonical 
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distribution. In systems with short-range interaction 
(i.e. with y(r) vanishing for |r| large enough) the 
point c is a critical point if the pair correlation tends 
to 0 (see [57]), slower than exponential (e.g., as a 
power of the distance |r| = |q; — q>)). 

A typical example is the two-dimensional Ising 
model on a square lattice and with nearest-neighbor 
ferromagnetic interaction of size J. It has a single 
critical point at B= B, b — 0 with sinh 23.] = 1. The 
cluster property is that (axay) — (ax) (ay) == 0 as 


ix—y| 
e "UBiix-» e ^x» 
A408) ———. = 
Vix — »l Ix — y| 
1 
Ae TI [58] 
x — yl 


for 8 < Be, B > X, or B=, respectively, where 
A«(8), Ac, &(B) > 0. The properties [58] stem from 
the exact solution of the model. 

At the critical point, several interesting phenom- 
ena occur: the lack of exponential decay indicates 
lack of a length scale over which really distinct 
phenomena can take place, and properties of the 
system observed at different length scales are likely 
to be simply related by suitable scaling transforma- 
tions. Many efforts have been dedicated at finding 
ways of understanding quantitatively the scaling 
properties pertaining to different observables. The 
result has been the development of the renormaliza- 
tion group approach to critical phenomena (cf. the 
section *Renormalization group"). The picture that 
emerges is that the closer the critical point is the 
larger becomes the maximal scale of length below 
which scaling properties are observed. For instance, 
in a lattice spin system in zero field the magnetiza- 
tion M|A| * in a box A C Q should have essentially 
the same distribution for all A's with side < /9(8) and 
lo(8) — oo as B > Be, provided a is suitably chosen. 
The number a is called a critical exponent. 

There are several other “critical exponents” that 
can be defined near a critical point. They can 
be associated with singularities of the thermody- 
namic function or with the behavior of 
the correlation functions involving joint densities at 
two or more than two points. As an example, 
consider a lattice spin system: then the *2;-spins 
correlation" (0995, ...02,, ,), could behave propor- 
tionally to Ylh, £241), 5 — 1,2, 3,..., for a 
suitable family of homogeneous functions x,, of 
some degree wn, of the coordinates (£1,...,£2, 1) 
at east when the reciprocal distances are large but 
« lo(8) and 


lo(8) = const.(8 — 8.) " — oo 


[1— (o 


This means that if £; are regarded as points in Rf 
there are functions x», such that 


wl +2 2 S) = A 5 (OE), one ey - 1) 
O<AER [59] 


and (090g, ...05, ,) © X2n(0,&1,..-,£29-1) if 1< 
Ix; — x;| < lo(8). The numbers wz, define a sequence 
of critical exponents. 

Other critical exponents can be associated with 
approaching the critical point along other directions 
(e.g., along h — 0 at 8 = 8e). In this case, the length up 
to which there are scaling phenomena is /o(5) = £)b ^". 
Further, the magnetization m(h) tends to 0 ash — 0 at 
fixed 8 = B. as m(b) — mob!/^ for 6 > 0. 

None of the feautres of critical exponents is known 
rigorously, including their existence. An exception is the 
case of the two-dimensional nearest-neighbor Ising 
ferromagnet where some exponents are known exactly 
(e.g., w2 = 1/4, wr, = nw», or v = 1, while 6, v are not 
rigorously known). Nevertheless, for Ising ferromag- 
nets (not even nearest-neighbor but, as always here, 
finite-range) in all dimensions, all of the exponents 
mentioned are conjectured to be the same as those 
of the nearest-neighbor Ising ferromagnet. A further 
exception is the derivation of rigorous relations 
between critical exponents and, in some cases, even 
their values under the assumption that they exist. 


Remark Naively it could be expected that in a pure 
state in zero field with (o,)=0 the quantity 
s—|A| ^ Y^... ox, if A is a cubic box of side £, 
should have a probability distribution which is 
Gaussian, with dispersion lim, .4(s?). This is 
"usually true," but not always. Properties [58] 
show that in the d=2 ferromagnetic nearest- 


neighbor Ising model, (s?) diverges proportionally 


to /^ 3 so that the variable s cannot have the above 
Gaussian distribution. The variable S—|A| "/* 
Seve, 7x Will have a finite dispersion: however, 
there is no reason that it should be Gaussian. This 
makes clear the great interest of a fluctuation theory 
and its relevance for the critical point studies (see 
the next two sections). 


For more details, the reader is referred to Onsager 
(1944), Domb and Green (1972), McCoy and Wu 
(1973), and Aizenman (1982). 


Fluctuations 


As it appears from the discussion in the last section, 
fluctuations of observables around their averages 
have interesting properties particularly at critical 
points. Of particular interest are observables that 
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are averages, over large volumes A, of local functions 
F(x) on phase space: this is so because macroscopic 
observables often have this form. For instance, given 
a region A inside the system container 2, A C Q, 
consider a configuration x= (P,Q) and the number 
of particles NA =` gcd 1 in A, or the potential energy 
OD, = digger pla- gq) or the kinetic energy 
Ka = ogc, (1/2m)p*. In the case of lattice spin 
systems, consider a configuration g and, for instance, 
the magnetization Ma = J jeg; in A. Label the 
above four examples by a = 1,...,4. 

Let Ha be the probability distribution describing 
the equilibrium státe in which the quantities X A are 
considered; let x4— (XA/|A]),, and p XA 一 
x4)/|A|. Then typical properties of fluctuations that 

should be investigated are (o — 1,...,4): 


1. for all § > 0 it is limy_.~ ua (|p| > 6) 0 (law of 
large numbers); 
2. there is D, > 0 such that 


u(pr/|A| € [a, b]) = =, | ss e 92D, 


(central limit law); and 
3. there is an interval Ia = (p; ,p; ,) and a concave 
function F,(p), p € I, such that if [a,b] C I then 


n LLL € |a, b]) — m Fa(P) 


人 一 co pela,b 
(large deviations law). 


The law of large numbers provides the certainty 
of the macroscopic values; the central limit law 
controls the small fluctuations (of order \/|A]) of X4 
around its average; and the large deviations law 
concerns the fluctuations of order |A|. 

The relations (1)-(3) above are not always true: 
they can be proved under further general assump- 
tions if the potential 2 satisfies [14] in the case of 
particle systems or if 5;,|p(q)| < oo in the case 
of lattice spin systems. “The function F,(p) ,is 
defined in terms of the thermodynamic limits of 
suitable thermodynamic functions associated with 
the equilibrium state Ja. The further assumption is, 
essentially in all cases, that a suitable thermody- 
namic function in terms of which F,(p) will be 
expressed is smooth and has a nonvanishing second 
derivative. 

For the purpose of a simple concrete example, 
consider a lattice spin system of Ising type with 
energy —5 ^. yen P(X 一 y)axoy — 2x box and the fluc- 
tuations of the magnetization Ma =} e4 0x A C Q, 
in the grand canonical equilibrium states pp, 5. 

€ the free energy be 5f(8,b) (see [41]), let 

— m(b) € (M,/|A|) and let b(m) be the inverse 


function of m(h). If p= M4/|A| the function F(p) is 
given by 


F(p) = B(f (B.b(p)) — f (8.b) — Of (G,b)(b(p) —P)) [60] 
then a quite general result is: 


Theorem The relations (1)-(3) hold if the potential 
satisfies >, |p(x)| < oo and if F(p) [60] is smooth 
and F"(p) #0 in open intervals around those in 
which p is considered, that is, around p=0 for the 
law of large numbers and for the central limit law or 
in an open interval containing a,b for the case of the 
large deviations law. 


In the cases envisaged, the theory of equivalence 
of ensembles implies that the function F can also be 
computed via thermodynamic functions naturally 
associated with other equilibrium ensembles. For 
instance, instead of the grand canonical f(3,), one 
could consider the canonical Bg(8, m) (see [41]), then 


F(p) = —8(g(B.p) —g(8.m) —Ong(3,m)(p—m)) [61] 


It has to be remarked that there should be a 
strong relation between the central limit law and the 
law of large deviations. Setting aside stating the 
conditions for a precise mathematical theorem, the 
statement can be efficiently illustrated in the case of 
a ferromagnetic lattice spin system and with A = Q, 
by showing that the law of large deviations in small 
intervals, around the average m(ho), at a value ho of 
the external field, is implied by the validity of the 
central limit law for all values of h near bo and vice 
versa (here 8 is fixed). Taking bo = 0 (for simplicity), 
the heuristic reasons are the following. Let ju, be 
the grand canonical distribution in external field 5. 
Then: 


1. The probability jon(p € dp) is proportional, 
by definition, to po o(p € dp)e”?!"!. Hence, 
if the central limit law holds for all bh near 
bo — 0, there will exist two functions m(h) and 
D(b)>0, defined for b near 59-0, with 
m(0) — 0 and 


po(p € dp)e ^P" 


2 
— const.exp 区 (5) 


(p-m 


2. There is a function C(m) such that 9,,C(m(b)) = Gh 
and 82 C(m(b)) = D(b) >. (This is obtained. by 
noting that, given D(h), the differential equation 
O,B8b-— D(b)* with the initial value 5(0)—0 
determines the function h(m); therefore, C(m) 
is determined by a second integration, from 


Os G(m) = Bhim). 
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It then follows, heuristically, that the probability 
of p in zero field has the form const. e«)?! dp so 
that the probability that p € [a,b] will be const 
exp (lo| max plas] C). 

Conversely, the large deviations law for p at h — 0 
implies the validity of the central limit law for the 
fluctuations of p in all small enough fields h: this 
simply arises from the function F(p) having a 
negative second derivative. 

This means that there is a *duality" between central 
limit law and large deviation law or that the law of 
large deviations is a “global version" of the central 
limit law, in the sense that: 


1. if the central limit law holds for h in an interval 
around bo then the fluctuations of the magnetiza- 
tion at field po satisfy a large deviation law in a 
small enough interval J around m(ho); and 

2. if a large deviation law is satisfied in an interval 
around bo then the central limit law holds for the 
fluctuations of magnetization around its average 
in all fields h with h — ho small enough. 


Going beyond the heuristic level in establishing 
the duality amounts to giving a precise meaning to 
*small enough" and to discuss which properties of 
m(b) and D(h), or F(p) are needed to derive 
properties (1), (2). 

For purposes of illustration consider the Ising 
model with ferromagnetic short range interaction y: 
then the central limit law holds for all P if 3 is small 
enough and, under the same condition on 5, the 
large deviations law holds for all h and all intervals 
[a, b] C (—1, 1). If 8 is not small then the condition 
b #0 has to be added. Hence, the conditions are 
fairly weak and the apparent exceptions concern the 
value h=0 and 8 not small where the statements 
may become invalid because of possible phase 
transitions. 

In presence of phase transitions, the law of large 
numbers, the central limit law, and law of large 
deviations should be reformulated. Basically, one 
has to add the requirement that fluctuations are 
considered in pure phases and change, in a natural 
way, the formulation of the laws. For instance, 
the large fluctuations of magnetization in a pure 
phase of the Ising model in zero field and large 8 
(i.e., in a state obtained as limit of finite-volume 
states with -- or — boundary conditions) in 
intervals [a, b] which do not contain the average 
magnetization m* are not necessarily exponen- 
tially small with the size of |A|: if [a,b] C 
[—m*,m*| they are exponentially small but only 
with the size of the surface of A (ie., with 
|AJ47D/4) while they are exponentially small with 
the volume if [a,b] n | ^ ,m*] — 0. 


The discussion of the last section shows that at 
the critical point the nature of the large fluctuations 
is also expected to change: no central limit law is 
expected to hold in general because of the example 
of [58] with the divergence of the average of the 
normal second moment of the magnetization in a 
box as the side tends to oc. 

For more details the reader is referred to Olla 
(1987). 


Renormalization Group 


The theory of fluctuations just discussed concerns 
only fluctuations of a single quantity. The problem 
of joint fluctuations of several quantities is also 
interesting and in fact led to really new develop- 
ments in the 1970s. It is necessary to restrict 
attention to rather special cases in order to illustrate 
some ideas and the philosophy behind the approach. 
Consider, therefore, the equilibrium distribution o 
associated with one of the classical equilibrium 
ensembles. To fix the ideas we consider the 
equilibrium distribution of an Ising energy function 
BHo, having included the temperature factor in the 
energy: the inclusion is done because the discussion 
will deal with the properties of uo as a function of 5. 
It will also be assumed that the average of each spin 
is zero (“no magnetic field,” see [39] with 5—0). 
Keeping in mind a concrete case, imagine that GHo 
is the energy function of the nearest-neighbor Ising 
ferromagnet in zero field. 

Imagine that the volume €) of the container has 
periodic boundary conditions and is very large, 
ideally infinite. Define the family of blocks ké, 
parametrized by č € Z^ and with k an integer, 
consisting of the lattice sites x= (k£; € x; < (k + 1) 
£j]. This is a lattice of cubic blocks with side size k 
that will be called the “k-rescaled lattice." 

Given a, the quantities m, =k eer Ox are 
called the block spins and define the map 
R? pHo = Hk transforming the initial distribution on 
the original spins into the distribution of the block 
spins. Note that if the initial spins have only two 
values ox = +1, the block spins take values between 
一 Rd/Rad and Rd/Red at steps of size 2/k^^. Further- 
more, the map R} , makes sense independently of 
how many values the initial spins can assume, and 
even if they assume a continuum of values S, € R. 

Taking a=1 means, for k large, looking at the 
probability distribution of the joint large fluctuations 
in the blocks kë. Taking a=1/2 corresponds to 
studying a joint central limit property for the block 
variables. 

Considering a one-parameter family of initial 
distributions jw parametrized by a parameter ( 


80 Introductory Article: Equilibrium Statistical Mechanics 


(that will be identified with the inverse temperature), 
typically there will be a unique value a(5) of a such 
that the joint fluctuations of the block variables 
admit a limiting distribution, 


prob, (me € lag; bcl,o € A) 


for some distribution g4(z) on R^. 

If a > a(), the limit will then be Teea 6(Sz) dS;, 
or if a < a(8) the limit will not exist (because the 
block variables will be too large, with a dispersion 
diverging as k — ox). 

It is convenient to choose as sequence of k — oo 
the sequence k= 2" with » — 0, 1,... because in this 
way it is R} = R7, and the lishing k — oo along 
the sequence k — 2" can be regarded as limits on a 
sequence of iterations of a map Ri acting on the 
probability distributions of generic spins Sẹ on the 
lattice Z (the sequence 3” would be equally 
suited). 

It is even more convenient to consider probability 
distributions that are expressed in terms of energy 
functions H which generate, in the thermodynamic 
limit, a distribution ji: then R7, defines an action 
Ra on the energy functions so that R,H = H’ if H 
generates p, H' generates j/ and Ri j=yp'. Ot 
course, the energy function will be more general 
than [39] and at least a form like 6U in [49] has to 
be admitted. 

In other words, Ra gives the result of the action 
of Ri, expressed as a map acting on the energy 
functions. Its iterates also define a semigroup 
which is called the block spin renormalization 
group. 

While the map Rý; is certainly well defined as a 
map of probability distributions into probability 
distributions, it is by no means clear that Ra is well 
defined as a map on the energy functions. Because, if 
u is given by an energy function, it is not clear that 
Rj, is such. 

A remarkable theorem can be (easily) proved 
when R* į and its iterates act on initial jj9's which 
are Mar dns states of a spin system with short- 
range interactions and at high temperature (8 small). 
In this case, if a= 1/2, the sequence of distributions 
RT 4Ho(8) admits a limit which is given by 
a product of independent Gaussians: 


prob, (me = lag, bz], 0 = A) 


(bz) 
ex 
P lI o(-a5 


{ag} ČEA 


2D(g) S) IL zm aie 


Note that this theorem is stated without even 
mentioning the renormalization maps R7: it can 
nevertheless be interpreted as stating that 


Rib. 3 sg ips 四 


but the interpretation is not rigorous because [64] 
does not state require that R7 j4 Ho(B) makes sense 
for n > 1. It states that at high temperature block 
spins have normal independent fluctuations: it is 
therefore an extension of the central limit law. 

There are a few cases in which the map R, can be 
rigorously shown to be well defined at least when 
acting on special equilibrium states like the high- 
temperature lattice spin systems: but these are 
exceptional cases of relatively little interest. 

Nevertheless, there is a vast literature dealing with 
approximate representations of the map R,. The 
reason is that, assuming not only its existence but 
also that it has the properties that one would 
normally expect to hold for a map acting on a finite 
dimensional space, it follows that a number of 
consequences can be drawn; quite nontrivial ones as 
they led to the first theory of the critical point that 
goes beyond the van der Waals theory discribed in 
the section *van der Waals theory." 

The argument proceeds essentially as follows. At 
the critical point, the fluctuations are expected to be 
anomalous (cf. the last remark in the section “Critical 
points”) in the sense that (( pea oz/ VJA?) will 
tend to oo, because o — 1/2 does not correspond to 
the right fluctuation scale of 5 cen oc, signaling that 
Ry». 1Ao( 友 ) will not have a limit but, possibly, there 
is Qe > 1/2 such that R7" , uo(8.) converges to a limit 
in the sense of [63]. In FA case of the critical nearest- 
neighbor Ising ferromagnetic a; — 7/8 (see ending 
remark in the section “Critical points"). Therefore, if 
the map R7, , is considered as acting on j0(0), it will 
happen that for all B < Bey Rz" 4po(Bc) will converge to 
a trivial limit [ [ee 6(Sz) dS; ‘because the value a, is 
greater than 1/2 while normal fluctuations are expected. 

If the map Ra, can be considered as a map on the 
energy functions, this says that [leen 6(Sz) dS¢ is a 

“(trivial) fixed point of the renormalization group” 
which “attracts” the energy functions BHo corre- 
sponding to the high-temperature phases. 

The existence of the critical 9. can be associated 
with the existence of a nontrivial fixed point H* tor 
Ry, which is hyperbolic with just one Lyapunov 
exponent \ > 1; hence, it has a stable manifold of 
codimension 1. Call / the probability distribution 
corresponding to H*. 

The migration towards the trivial fixed point for 
B « B. can be explained simply by the fact that for 
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such values of 59 the initial energy function BHo is 
outside the stable manifold of the nontrivial fixed 
point and under application of the renormalization 
transformation R» , GHo migrates toward the trivial 
fixed point, which is attractive in all directions. 

By increasing f/f, it may happen that, for 
B=B., BHo crosses the stable manifold of the 
nontrivial fixed point H* for Ra. Then R} BeHo 
will no longer tend to the trivial fixed point but it 
will tend to H*: this means that the block spin 
variables will exhibit a completely different fluctua- 
tion behavior. If 8 is close to 4, the iterations of Ry. 
will bring R} BHo close to H*, only to be eventually 
repelled along the unstable direction reaching a 
distance from it increasing as A"|3 — Bel. 

This means that up to a scale length O(2””)) lattice 
units with A”® 3 — 3.| — 1 (i.e., up to a scale O(|8— 
B. | oe ^j) the fluctuations will be close to those of the 
fixed point distribution p*, but beyond that scale they 
will come close to those of the trivial fixed point: to see 
them the block spins would have to be normalized 
with index o—1/2 and they would appear as 
uncorrelated Gaussian fluctuations (cf. [64], [65]). 

The next question concerns finding the nontrivial 
fixed points, which means finding the energy 
functions H* and the corresponding a, which are 
fixed points of Ra.. If the above picture is correct, 
the distributions / corresponding to the H* would 
describe the critical fluctuations and, if there was 
only one choice, or a limited number of choices, of 
a, and H* this would open the way to a universality 
theory of the critical point hinted already by the 
“primitive” results of van der Waals’ theory. 

The initial hope was, perhaps, that there would be a 
very small number of critical values a, and H* 
possible: but it rapidly faded away leaving, however, 
the possibility that the critical fluctuations could be 
classified into universality classes. Each class would 
contain many energy functions which, upon iterated 
actions of Ra., would evolve under the control of the 
trivial fixed point (always existing) for 8 small while, 
for B= ße, they would be controlled, instead, by a 
nontrivial fixed point H* for Ra, with the same ac and 
the same H*. For B< B., a “resolution” of the 
approach to the trivial fixed point would be seen by 
considering the map R;;; rather than Ra, whose 
iterates would, however, lead to a Gaussian distribu- 
tion like [64] (and to a limit energy function like [65]). 

The picture is highly hypothetical: but it is 
the first suggestion of a mechanism leading to 
critical points with the character of universality 
and with exponents different from those of the van 
der Waals theory or, for ferromagnets on a lattice, 
from those of its lattice version (the Curie—Weiss 
theory). Furthermore, accepting the approximations 


(e.g., the Wilson-Fisher £-expansion) that allow one 
to pass from the well-defined R7, | to the action of 
R, on the energy functions, it is possible to obtain 
quite unambiguously values for a, and expressions 
for H* which are associated with the action of Ra, 
on various classes of models. 

For instance, it can lead to conclude that the 
critical behavior of all ferromagnetic finite-range 
lattice spin systems (with energy functions given by 
[39]) have critical points controlled by the same a, 
amd the same nontrivial fixed point: this property is 
far from being mathematically proved, but it is 
considered a major success of the theory. One has to 
compare it with van der Waals' critical point theory: 
for tbe first time, an approximation scheme has 
led, even though under approximations not fully 
controllable, to computable critical exponents which 
are not equal to those of the van der Waals theory. 

The renormalization group approach to critical 
phenomena has many variants, depending on which 
kind of fluctuations are considered and on the models 
to which it is applied. In statistical mechanics, there 
are a few mathematically complete applications: 
certain results in higher dimensions, theory of dipole 
gas in d — 2, hierarchical models, some problems in 
condensed matter and in statistical mechanics of 
lattice spins, and a few others. Its main mathematical 
successes have occured in various related fields where 
not only the philosophy described above can be 
applied but it leads to renormalization transforma- 
tions that can be defined precisely and studied in 
detail: for example, constructive field theory, KAM 
theory of quasiperiodic motions, and various pro- 
blems in dynamical systems. 

However, the applications always concern special 
cases and in each of them the general picture of the 
trivial-nontrivial fixed point dichotomy appears 
realized but without being accompanied, except in 
rare cases (like the hierarchical models or the 
universality theory of maps of the interval), by the 
full description of stable manifold, unstable direction, 
and action of the renormalization transformation on 
objects other than the one of immediate interest (a 
generality which looks often an intractable problem, 
but which also turns out not to be necessary). 

In the renormalization group context, mathema- 
tical physics has played an important role also by 
providing clear evidence that universality classes 
could not be too few: this was shown by the 
numerous exact solutions after Onsager's solution 
of the nearest-neighbor Ising ferromagnet: there are 
in fact several lattice models in d —2 that exhibit 
critical points with some critical exponents exactly 
computable and that depend continuously on the 
models parameters. 
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For more details, we refer the reader to McCoy 
and Wu (1973), Baxter (1982), Bleher and Sinai 
(1975), Wilson and Fisher (1972), Gawedzky and 
Kupiainen (1983, 1985), Benfatto and Gallavotti 
(1995), and Mastropietro (2004). 


Quantum Statistics 


Statistical mechanics is extended to assemblies of 
quantum particles rather straightforwardly. In the 
case of N identical particles, the observables are 
operators O on the Hilbert space 


Hy = 12(2)N Sor Hy = (L2(2) @ C))N 


where a — --, —, of the symmetric (a= +, bosonic 
particles) or antisymmetric (a= —, fermionic parti- 
cles) functions (Q), Q = (q,,..., qw), of the posi- 
tion coordinates of the particles or of the position 
and spin coordinates Y(Q, 9), © =(01,...,0N), nor- 
malized so that 


[wo 


here only oj=+1 is considered. As in classical 
mechanics, a state is defined by the average values 
(O) that it attributes to the observables. 

Microcanonical, canonical, and grand canonical 
ensembles can be defined quite easily. For instance, 
consider a system described by the Hamiltonian 
(b = Planck's constant) 


PdQ=1 or Y [ooa - 1 


=K+® [66] 


where periodic boundary conditions are imagined 
on Q and w(q) is periodic, smooth potential (the side 
of Q is supposed to be a multiple of the periodic 
potential period if w #0). Then a canonical 
equilibrium state with inverse temperature @ and 
specific volume v = V/N attributes to the observable 
O the average value 


tr e? Hw () 
tr e—SHn 


(0) = 


Similar definitions can be given for the grand 
canonical equilibrium states. 

Remarkably, the ensembles are orthodic and a “heat 
theorem” (see the section “Heat theorem and ergodic 
hypothesis”) can be proved. However, “equipartition” 
does not hold: that is, (K) 4 (d/2)N5^, although 57! 
is still the integrating factor of dU + p dV in the heat 
theorem; hence, 3^! continues to be proportional to 
temperature. 


[67] 


Lack of equipartition is important, as it solves 
paradoxes that arise in classical statistical mechanics 
applied to systems with infinitely many degrees 
of freedom, like crystals (modeled by lattices of 
coupled oscillators) or fields (e.g., the electromagnetic 
field important in the study of black body radiation). 
However, although this has been the first surprise of 
quantum statistics (and in fact responsible for the 
very discovery of quanta), it is by no means the last. 

At low temperatures, new unexpected (ie., 
with no analogs in classical statistical mechanics) 
phenomena | occur: Bose-Einstein condensation 
(superfluidity), Fermi surface instability (supercon- 
ductivity), and appearance of off-diagonal long- 
range order (ODLRO)-will be selected to illustrate 
the deeply different kinds of problems of quantum 
statistical mechanics. Largely not yet understood, 
such phenomena pose very interesting problems not 
only from the physical point of view but also from 
the mathematical point of view and may pose 
challenges even at the level of a definition. However, 
it should be kept in mind that in the interesting cases 
(i.e., three-dimensional systems and even most two- 
and one-dimensional systems) there is no proof that 
the objects defined below really exist for the systems 
like [66] (see, however, the final comment for an 
important exception). 


Bose-Einstein Condensation 


In a canonical state with parameters /3,v, a defini- 
tion of the occurrence of Bose condensation is in 
terms of the eigenvalues v;((Q, N) of the kernel 
fp(qd,d) on L»(Q), called the one-particle reduced 
density matrix, defined by 


m 


X V»(d.di..-..qu-1) dqi...ddw., — [68] 


where E,(QO,N) are the eigenvalues of Hy and 
Wn(Gys--->9n) are the corresponding eigenfunctions. 
If v; are ordered by increasing value, the state with 
parameters J,v is said to contain a Bose-Einstein 
condensate if v4(Q, N) > bN > 0 for all large Q at 
v —V/N,B fixed. This receives the interpretation 
that there are more than bN particles with equal 
momentum. The free Bose gas exhibits a Bose 
condensation phenomenon at fixed density and 
small temperature. 


Fermi Surface 


The wave functions ,(q1,01,..., dN, oN) = v4(Q, 0) 
are now antisymmetric in the permutations of the 
pairs (q;,0;). Let v(Q,0;N,nz) denote the nth 


uu Ru d isl 
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eigenfunction of the N-particle energy Hy in [66] with 
eigenvalue E(N,m) (labeled by »=0,1,... and non- 
decreasingly A Setting QO” — (qf. 
a" zo... 
Qc 

- o;Q',o") 


pa” 
x v(Q,o;Q".o0": N,n)v dus ,0;Q'.o":N,n) [69] 


which are called p-particle reduced density matrices 
(extending the corresponding one- € jreduced 
density matrix [68]. Denote p(q,— E»! pı 
(d1,0,d5,0). It is also useful to aa spinless 
fermionic systems: the corresponding definitions are 
obtained simply by suppressing the spin labels and 
will not be repeated. 

Let 7; (k) be the Fourier transform of pı (q — q'): the 
Fermi surface can be defined as the locus of the k’s in 
the neighborhood of which Opri(k) is unbounded as 
Q — oo, B — oo. The limit as 8 — oo is important 
because the notion of a Fermi surface is, possibly, 
precise only at zero temperature, that is at 8 — oc. 

So far, existence of Fermi surface (i.e., the smooth- 
ness of 71(k) except on a smooth surface in k-space) 
has been proved in free Fermi systems (= 0) and 


HN ( qn p^ 
TAM introduce the kernels pp "(Q,o; 


e PE(N, n) 


1. certain exactly soluble one-dimensional spinless 
systems and 

2. in rather general one-dimensional spinless systems 
or systems with spin and repulsive pair interac- 
tion, possibly in an external periodic potential. 


The spinning case in a periodic potential and 
dimension d > 2 is the most interesting case to study 
for its relevance in the theory of conduction in 
crystals. Essentially no mathematical results are 
available as the above-mentioned ones do not 
concern any case in dimension >1: this is a rather 
deceiving aspect of the theory and a-challenge. 

In dimension 2 or higher, for fermionic systems 
with Hamiltonian [66], not only there are no results 
available, even without spin, but it is not even clear 
that a Fermi surface can exist in presence of 
interesting interactions. 


Cooper Pairs 


The superconductivity theory has been phenomeno- 
logically related to the existence of Cooper pairs. 
Consider the Hamiltonian [66] and define (cf. [69]) 


p(x—9,0;X —3,d;x—x) 


def pz(x,o,y, —0; x o ,y', —o’) 


The system is said to contain Cooper pairs with 
spins 0,—960 (c — 4- or o= —) if there exist functions 


g"(q,c) #0 with 
/ £"(q,c)g" (q,c) dq 2 0 ifafa' 


such that 


lim p(x — y,o,x —y',o,x—x) 


V— 00 


Sea Ese.) y.) [o 


In this case, g^(x 一 yo) with largest Lz norm can be 
called, after normalize, the wave function of the paired 
state of lowest energy: this is the analog of the plane 
wave for a free particle (and, like it, it is manifestly not 
normalizable, i.e., it is not square integrable as a 
function of x,y). If the system contains Cooper pairs 
and the nonleading terms in the limit [70] vanish 
quickly enough the two-particle reduced density 
matrix [70] regarded as a kernel operator has an 
eigenvalue of order V as V — oo: that is, the state of 
lowest energy is *macroscopically occupied," quite 
like the free Bose condensation in the ground state. 
Cooper pairs instability might destroy the Fermi 
surface in the sense that ri(k) becomes analytic in k; 
but it is also possible that, even in the presence of 
them, there remains a surface which is the locus of the 
singularities of the function ri(k). In the first case, 
there should remain a trace of it as a very steep 
gradient of r;(k) of the order of an exponential in the 
inverse of the coupling strength; this is what happens 
in the BCS model tor superconductivity. The model is, 
however, a mean-field model and this particular 
regularity aspect might be one of its peculiarities. In 
any event, a smooth singularity surface is very likely to 
exist for some interesting density matrix (e.g., in the 
BCS model with “gap parameter y” the wave function 


g(x vk a) = E SS ;| ee (xy) fi dk 
(2m) Je(k)>0 (k)? +2 


of the lowest energy level of the Cooper pairs is 
singular on a surface coinciding with the Fermi 
surface of the free system). 


ODLRO 


Consider the k-fermion reduced density matrix 
Pk (Q.c : Q',o') as kernel operators O, on L2((Q x 
eri. Suppose k is even, then if O; has a (generalized) 
eigenvalue of order NE? as N — oo, N/V — p, the 
system is said to exhibit off-diagonal long-range order 
of order k. For k odd, ODLRO is defined to exist if O, 
has an eigenvalue of order N(&-1)/2 and k > 3 (if k=1 
the largest eigenvalue of O4 is necessarily <1). 
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For bosons, consider the reduced density matrix 
py (Q; Q) regarding it as a kernel operator O, on 
L;(Qy + and define ODLRO of order k to be present 
if O(k) has a «app eigenvalue of order N* as 
N 一 oo, N/V = 

ODLRO can be regarded as a unification of the 
notions of Bose condensation and of the existence of 
Cooper pairs, because Bose condensation could be 
said to correspond to the kernel operator pi(q, — q>) 
in [68] having a (generalized) eigenvalue of order N, 
and to be a case of ODLRO of order 1. If the state is 
pure in the sense that it has a cluster property (see 
the sections “Phasë transitions and boundary condi- 
tions" and “Lattice 3nodels"), then the existence of 
ODLRO, Bose condensation, and Cooper pairs 
implies that the system shows a spontaneously 
broken symmetry: conservation of particle number 
and clustering imply that the off-diagonal elements 
of (all) reduced density matrices vanish at infinite 
separation in states obtained as limits of states with 
periodic boundary conditions and Hamiltonian [66], 
and this is incompatible with ODLRO. 

The free Fermi gas has no ODLRO, the BCS model 
of superconductivity has Cooper pairs and ODLRO 
with & — 2, but no Fermi surface in the above sense 
(possibly too strict). Fermionic systems cannot have 
ODLRO of order 1 (because the reduced density 
matrix of order 1 is bounded by 1). 

The contribution of mathematical physics has 
been particularly effective in providing exactly 
soluble models: however, the soluble models deal 
with one-dimensional systems and it can be shown 
that in dimensions 1, 2 no ODLRO can take place. 
A major advance is the recent proof of ODLRO and 
Bose condensation in the case of a lattice version of 
[66] at a special density value (and d > 3). 

In no case, for the Hamiltonian [66] with w Æ 0, 
existence of Cooper pairs has been proved nor 
existence of a Fermi surface for d > 1. Nevertheless, 
both Bose condensation and Cooper pairs formation 
can be proved to occur rigorously in certain limiting 
situations. There are also a variety of phenomena 
(e.g., simple spectral properties of the Hamiltonians) 
which are believed to occur once some of the 
above-mentioned ones do occur and several of 
them can be proved to exist in concrete models. 

If d= 1,2, ODLRO can be proved to be impos- 
sible at 了 > 0 through the use of Bogoliubov's 
inequality (used in the *no d — 2 crystal theorem," 
see the section “Continuous symmetries: ‘no d —2 
crystal" theorem"). 

For more details, the reader is referred to Penrose 
and Onsager (1956), Yang (1962), Ruelle (1969), 
Hohenberg (1967),  Gallavotti (1999), and 
Aizenman et al. (2004). 


Appendix 1: The Physical Meaning of the 
Stability Conditions 


It is useful to see what would happen if the 
conditions of stability and temperedness (see [14]) 
are violated. The analysis also illustrates some of the 
typical methods of statistical mechanics. 


Coalescence Catastrophe due 
to Short-Distance Attraction 


The simplest violation of the first condition in [14] 
occurs when the potential is smooth and negative 
at the origin. 

Let 6 > 0 be so small that the potential at distances 
« 26 is € —b «0. Consider the canonical distribution 
with parameters 5, N in a (cubic) box 2 of volume V. 
The probability Puuapse that all the N particles are 
located in a little sphere of radius ó around the center 
of the box (or around any prefixed point of the box) is 
estimated from below by remarking that 


N bs 


so that 


P collapse 


dpdq e PK) + 9(q)) 
c PNN! 


dpdq _—3(K(p) 4-(q)) 
PINNIS 


—3N. 
ee BN 9b(1/2)N(N — 1) 
N! 
Dj) -——— ——— E E 


3 万 
> [71] 


dq -58(9) 
PNN! 


The phase space is extremely small: nevertheless, 
such configurations are far more probable than the 
configurations which “look macroscopically cor- 
rect,” that is, configurations with particles more or 
less spaced by the average particle distance expected 
in a macroscopically homogeneous configuration, 
namely (N/V) !? —5-!/, Their energy ®(q) is of 
the order of uN for some z, so that their probability 
will be bounded above by 


/ dpdq —B(K(p) + uN) 
PINNIS 
Pss ular = TM^ ee ee eg me A 
: dpdq —B(K(p) + &(q)) 
PNN!" 
VN is 一 BuN 
be N! 
— ee ——X 72 
dq _—86(q) ra 
Bb3NN! 
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However, no matter how small 6 is, the 
ratio Pregular/Pcollapse Will approach 0 as V — oo, 
N/V 一 v !; this occurs extremely rapidly because 
epbN /2 eventually dominates over VN ~ eNlogN, 
Thus, it is far more probable to find the system in a 
microscopic volume of size 6 rather than in a 
configuration in which the energy has some macro- 
scopic value proportional to N. This catastrophe can 
be called an ultraviolet catastrophe (as it is due to the 
behavior at very short distances) and it causes the 
collapse of the particles into configurations concen- 
trated in regions as small as we please as V — oc. 


Coalescence Catastrophe due 
to Long-Range Attraction 


It occurs when the potential is too attractive near oc. 
For simplicity, suppose that the potential has a hard 
core, i.e., it is 十 oo for + < ro, so that the above- 
discussed coalescence cannot occur and the system 
density bounded above by a certain quantity pep < oc 
(close-packing density). 

The catastrophe occurs if e(q) ~ —g|d| ^ ,g,& > 9, 
for |9| large. For instance, this is the case for matter 
interacting gravitationally; if k is the gravitational 
constant, 7 is the particle mass, then g = km? and £ — 2. 

The probability P,egular of “regular configurations,” 
where particles are at distances of order p^!/? from 
their close neighbors, is compared with the probability 
Peolapse Of "catastrophic configurations," with the 
particles at distances ro from their close neighbors to 
form a configuration of density pep/(1 + 6)? almost in 
close packing (so that ro is equal to the hard-core 
radius times 1 + ô). In the latter case, the system does 
not fill the available volume and leaves empty a region 
whose volume is a fraction ~ ((pep — p)/Pep)V of V. 
Further, it can be checked that the ratio P, / Peollapse 
tends to 0 at a rate O(exp (g3 N(pep(1 二 二 一 p))) 
if 6 is small enough (and p < pep). 

A system which is too attractive at infinity will not 
occupy the available volume but will stay confined in a 
close-packed configuration even in empty space. 

This is important in the theory of stars: stars cannot 
be expected to obey “regular thermodynamics” and in 
particular will not *evaporate" because their particles 
interact via the gravitational force at large distances. 
Stars do not occupy the whole volume given to them 
(1.e., the universe); they do not collapse to a point only 
because the interaction has a strongly repulsive core 
(even when they are burnt out and the radiation pressure 
is no longer able to keep them at a reasonable size). 


-She 


Evaporation Catastrophe 


This is another infrared catastrophe, that is, a 
catastrophe due to the long-range structure of the 


interactions in the above subsection; it occurs when 
the potential is too repulsive at oc, that is, 


plq) ~ +glql ^^ as 
so that the temperedness condition is 
violated. 

In addition, in this case, the system does not 
occupy the whole volume: it will generate a layer of 
particles sticking, in close-packed configuration, to 
the walls of the container. Therefore, if the density is 
lower than the close-packing density, p < pep, the 
system will leave a region around the center of the 
container Q empty; and the volume of the empty 
region will still be of the order of the total volume of 
the box (i.e., its diameter will be a fraction of the 
box side L). The proof is completely analogous to 
the one of the previous case; except that now the 
configuration with lowest energy will be the one 
sticking to the wall and close packed there, rather 
than the one close packed at the center. 

Also this catastrophe is important as it is realized in 
systems of charged particles bearing the same charge: 
the charges adhere to the boundary in close-packing 
configuration, and dispose themselves so that the 
electrostatic potential energy is minimal. Therefore, 
charges deposited on a metal will not occupy the whole 
volume: they will rather form a surface layer minimiz- 
ing the potential energy (i.e., so that the Coulomb 
potential in the interior is constant). In general, charges 
in excess of neutrality do not behave thermodynami- 
cally: for instance, besides not occupying the whole 
volume given to them, they will not contribute 
normally to the specific heat. 

Neutral systems of charges behave thermodyna- 
mically if they have hard cores, so that the 
ultraviolet catastrophe cannot occur or if they obey 
quantum-mechanical laws and consist of fermionic 
particles (plus possibly bosonic particles with 
charges of only one sign). 

For more details, we refer the reader to Lieb 
and Lebowitz (1972) and Lieb and Thirring (2001). 


q — oo 


again 


Appendix 2: The Subadditivity Method 


A simple consequence of the assumptions is that the 
exponential in (5.2) can be bounded above by 
e®BN exp(— P- y^ . P^) so that 


2m 
1 < Zac UD, A, V) < exp( vere” Vimo) 
=>0< vlog Zgc(B, ^, V) < e ePB /2mp-t (73] 


Consider, for simplicity, the case of a hard-core 
interaction with finite range (cf. [14]). Consider a 
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sequence of boxes (),, with sides 2"Lo, where Ly > 0 
is arbitrarily fixed to be >2R. The partition function 
Zg. (B, z) relative to the volume €, is 


mI -8e(Q) 
Ee | dQe 
N=0 Qn 


because the integral over the P variables can be 
explicitly performed and included in 2 if z is 
defined as z = e^ (2;ng-! 4? 

Then the box €, contains 24 boxes Q,. 1 for n > 1 
and 


Zn = 


1 < Z, < Z3, ekp(GB2d(Ln1/R)* 24) [74 


because the corridor of width 2R around the 
boundaries of the 27 cubes Q, ; filling Q, has 
volume 2RL, 127^ and contains at most 
(L,..1/ R)' 124 particles, each of which interacts 
with at most 27 other particles. Therefore, 


PPn 


= = [4 — a 


L ,logZ, 1 + GByg2-"(Lo/R)* ! 


for some 44 > 0. Hence, 0 < Gp, € Bp, 1 -- T42" 
for some T4 > 0 and p,, is bounded above and below 
uniformly in z. So, the limit [13] exists on the sequence 
L, = Lo2" and defines a function Bp (8, A). 

A box of arbitrary size L can be filled with about 
(L/La) boxes of side La with 元 so large that, 
rat sel 6 > 0, |Poo — pal < 6 for all n > n. Likewise, 
a box of size L, can be filled by about ( (L./Ly* 
boxes of size L if n is large. The latter remarks lead 
us to conclude, by standard inequalities, that the 
limit in [13] exists and coincides with p... 

The subadditivity method just demonstrated for 
finite-range potentials with hard core can be extended 
to the potentials satisfying just stability and tempered- 
ness (cf. the section *Thermodynamic limit"). 

For more details, the reader is referred to Ruelle 
(1969) and Gallavotti (1999). 


Appendix 3: An Infrared Inequality 


The infrared inequalities stem from Bogoliubov’s 
inequality. Consider as an example the problem of 
crystallization discussed in the section *Continuous 
symmetries: ‘no d=2 crystal’ theorem". Let (-) 
denote average over a canonical equilibrium state 
with Hamiltonian 

y 


H=) + U(Q) + eW(Q) 


| 
M. 


with given temperature and density parameters 
B,p,p=a>. Let (X, Y}= 2, (Op X Og; Y — 0, X Op, Y) 


be the Poisson bracket. Integration by parts, with 
periodic boundary conditions, yields 


[A (G, e ?"YdPdQ 
BZ.(B, p, N) 
^" (A*, C}) [75] 


(A*{C,H}) =- 


as a general identity. The latter identity implies, for 
A= {C, H}, that 
(H,Cy(H,C) --8 (C(H.C) [e 


Hence, the Schwartz inequality (A*A)((H,C]* 
(H,C])) > |((A*, CD? combined with the two 
relations in [75], [76] yields Bogoliubov's inequality: 


g^ MA* CHE - [77] 


WA) 2 q6 tC. HY) 


Let g, hb be arbitrary complex (differentiable) 
functions and 9; — ðq, 


N 
A Q) E 3 g(q)), 
j= 


Then H 273p; + 9(q,,.. 


N 
C(P.Q) V^ pb(aq) [78] 
j=1 


. qu), if 


\=5 de la; — 41) +e , Wi) (qj) 
Ar 
so that, via algebra, 
= 307: — p;(p; - 9j)hj) 
j 


with bj b(q;). If b is real valued, ((C,(C*, H]]) 


becomes, again via algebra, 


(Shih ð; - Lo 


i’ 
+ (Hawa) + Oo) 


(integrals on p; just replace p; by 28! and 
(pib) — 8^ 5 ;). Therefore, the average 
((C, (C*, H}}) becomes 


1 
€ > (h; E by) Alla; — qj) 
A 


+e) IBAW(q;) - 487 Yun) [79] 
j j 


(C. Hj 


Choose g(q)ze?**K^,h(g)— cosq-K and 
bound (b; — hp)? by K?(q; — qp}, (0jb;)" by x? and 
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b? by 1. Hence [79] is bounded above by ND(x) 
with 


1 
pw)“ (x (ars *t3N 4, - «eta ap 
HF 


rex X law) 80 


This can be used to estimate the denominator in 
[77]. For the LHS remark that 


N 

—1q- 2 

A) = bak: uer) 
三 1 


and 


aope = hog) 


= |K + x N?(p.(K) + p-(K + 2x))? 


hence [77] becomes, after multiplying both sides 
by the auxiliary function y(x) (assumed even and 
vanishing for |x| > 7/a) and summing over x, 


def 1 Lo (Kaya. 12 
Di--- Le g eH 
NL NID, 


» RE y(x) 
IKI? (p-(K) + p-(K + 2x) 
"us - XXE) 81 


To apply [77] the averages in [80], [81] have to be 
bounded above: this is a technical point that is 
discussed here, as it illustrates a general method of 
using the results on the thermodynamic limits and 
their convexity properties to obtain estimates. 

Note that ((1/N)57, 1k) d/k| oN eG?) is 
identically q(0) + (Z/N)(5 7... plq, — q;), 
piq) € = VN) Soe (Kje, 

Let Px, (q) = = plq) + Ad*|Av(q)| + npa) and 
let Fy(A, m C) (1/N)log Ze(A， n) with Z° the 
partition function in the volume Q computed 
with energy U'— 5, vd; — dj) +€); W(q;) + 
ne 2,|A4W(q;|. Then Fy(A,7,¢) is convex in 2,7 
and it is uniformly bounded above and below if 
Inl, lel, || < 1 (say) and |A| € Ao: here Ao > 0 exists 
if r*|Ayi(r)| satisfies the assumption set at the 
beginning of the section *Continuous symmetries: 
‘no d — 2 crystal’ theorem” and the density is smaller 
than a close packing (this is because the potential U' 
will still satisfy conditions similar to [14] uniformly 
in |e|, |n| < 1 and |A| small enough). 

Convexity and boundedness above and below 
in an interval imply bounds on the derivatives in 


the interior points, in this case on the derivatives of Fy 
with respect to à, 7), C at 0. The latter are identical to 
the averages in [80], [81]. In this way, the constants 
B1, B2, Bo such that D(x) < x?B, + eB; and Bo > Dj 
are found. 


For more details, the reader is referred to Mermin 
(1968). 
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Introduction 


Functional analysis is concerned with the study of 
functions and function spaces, combining techniques 
borrowed from classical analysis with algebraic 
techniques. Modern functional analysis developed 
around the problem of solving equations with 
solutions given by functions. After the differential 
and partial differential equations, which were 
studied in the eighteenth century, came the integral 
equations and other types of functional equations 
investigated in the nineteenth century, at the end of 
which arose the need to develop a new analysis, 
with functions of an infinite number of variables 
instead of the usual functions. In 1887, Volterra, 
inspired by the calculus of variations, suggested a 
new infinitesimal calculus where usual functions are 
replaced by functionals, that is, by maps from a 
function space to R or C, but he and his followers 
were still missing some algebraic and topological 
tools to be developed later. Modern analysis was 
born with the development of an “algebra of the 
infinite” closely related to classical linear algebra 
which by 1890 had (up to the concept of duality, 
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which was developed later) settled on firm ground. 
Strongly inspired by algebraic methods, Fredholm's 
work at the turn of the nineteenth century, in which 
emerged the concept of kernel of an operator, 
became a founding stone for the modern theory of 
integral equations. Hilbert developed further Fred- 
holm's methods for symmetric kernels, exploiting 
analogies with the theory of real quadratic forms 
and thereby making clear the importance of the 
notion of square-integrable functions. With Hilbert's 
Grundzüge einer allgemeinen Theorie der Integral- 
gleichung, a further step was made from the 
“algebra of the infinite" to the “geometry of the 
infinite." The contribution of Fréchet, who intro- 
duced the abstract notion of a space endowed with a 
distance, made it possible to transfer Euclidean 
geometry to the framework of what have since 
then been called Hilbert spaces, a basic concept in 
mathematics and quantum physics. 

The usefulness of functional analysis in the study 
of quantum systems became clear in the 1950s when 
Kato proved the self-adjointness of atomic Hamilto- 
nians, and Garding and Wightman formulated 
axioms for quantum field theory. Ever since func- 
tional analysis lies at the very heart of many 
approaches to quantum field theory. Applications 
of functional analysis stretch out to many branches 
of mathematics, among which are numerical 


analysis, global analysis, the theory of pseudodiffer- 
ential operators, differential geometry, operator 
algebras, noncommutative geometry, etc. 


Topological Vector Spaces 


Most topological spaces one comes across in practice 
are metric spaces. A metric on a topological space E 
is a map d: E x E — [0,+ oo[ which is symmetric, 
such that d(u,v) - 0 & u=v and which verifies the 
triangle inequality d(u,:w) < d(u,v) + d(v,w) for all 
vectors u,v,w. A topological space E is metrizable if 
there is a metric d on E compatible with the topology 
on E, in which case the balls with radius 1/7 centered 
at any point x € E form a local base at x - that is, a 
collection of neighborhoods of x such that every 
neighborhood of x contains a member of this 
collection. A sequence (u,,) in E then converges to 
u € E if and only if d(u„, u) converges to 0. 

The Banach fixed-point theorem on a complete 
metric space (E,d) is a useful tool in nonlinear 
functional analysis: it states that a (strict) contrac- 
tion on E, that is, a map T:E — E such that 
d(Tu, Tv) € k(u,v) for all u +v € E and fixed 0 < 
k<1, has a unique fixed point Tuo=uo. In 
particular, it provides local existence and uniqueness 
of solutions of differential equations dy/dt = F(y, t) 
with initial condition y(0) — yo, where F is Lipschitz 
continuous. 

Linear functional analysis starts from topological 
vector spaces, that is, vector spaces equipped with a 
topology for which the operations are continuous. A 
topological vector space equipped with a local base 
whose members are convex is said to be locally 
convex. Examples of locally convex spaces are 
normed linear spaces, namely vector spaces 
equipped with a norm, a concept that first arose in 
the work of Fréchet. A seminorm on a vector space 
V is a map p: V — [0,o0| which obeys the triangle 
identity p(u +v) € p(u) + plv) for any vectors u,v 
and such that p(Au)-— |Alp(u)-for any scalar 入 and 
any vector u; if p(u) =0 > uw — 0, it is a norm, often 
denoted by ||- ||. A norm on a vector space E gives 
rise to a translation-invariant distance function 
d(u, v) — ||u — v|| making it a metric space. 

Historically, one of the first examples of normed 
spaces is the space C([0, 1]) investigated by Riesz of 
(real- or complex-valued) continuous functions on 
the interval [0,1] equipped with the supremium 
norm ||f||.:— supPxejo,1) |f (x)|. In the 1920s, the 
general definition of Banach space arose in connec- 
tion with the works of Hahn and Banach. A normed 
linear space is a Banach space if it is complete as a 
metric space for the induced metric, C([0, 1]) being a 
prototype of a Banach space. More generally, for 
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any non-negative integer k, the space C*([0,1]) of 
functions on [0, 1] of class C* equipped with the 
norm ||f||,= $7; 9 llf ||, expressed in terms of a 
finite number of seminorms j||f"||. = sup, 
If? (x)],2— 0,..., R, is also a Banach space. 

The space C*([0, 1]) of smooth functions on the 
interval [0,1] is not anymore a Banach space since 
its topology is described by a countable family of 
seminorms ||f|, with k varying in the positive 
integers. The metric 


& Wai. 
dif) = » 14 lif - ell 


turns it into a Fréchet space, that is, a locally convex 
complete metric space. The space S(R") of rapidly 
decreasing functions, which are smooth functions f 
on R” for which 


llf ll. := sup |x“ Df (x)| 
xcR" 


is finite for any multiindices a and 5, is also a 
Fréchet space with the topology given by the 
seminorms ||- ||, g- Further examples of Fréchet 
spaces are the space C*(K) of smooth functions 
with support in a fixed compact subset K C R” 
equipped with the countable family of seminorms 


ID" f x = m IDzf(x), aE No 
XE 


and the space C*(M,E) of smooth sections of a 
vector bundle E over a closed manifold M equipped 
with a similar countable family of seminorms. Given 
an open subset Q= Upen Kp with Ky,p € N com- 
pact subsets of R”, the space D(Q) = Upen Cy (Kp) 
equipped with the inductive limit topology — for 
which a sequence (f,) in D(Q) converges to f € D(Q) 
if each f,, has support in some fixed compact subset 
K and (D^f,) converges uniformly to D^f on K for 
each mutilindex a - is a locally convex space. 
Among Banach spaces are Hilbert spaces which 
have properties very similar to those of finite- 
dimensional spaces and are historically the first 
type of infinite-dimensional space to appear with the 
works of Hilbert at the beginning of the twentieth 
century. A Hilbert space is a Banach space equipped 
with a norm Jl || that derives from an inner product, 
that is, |l — (4,4) with (-,-) a positive-definite 
bilinear (or sesquilinear according to whether the 
base space is real or complex) form. Hilbert spaces 
are fundamental building blocks in quantum 
mechanics; using (closed) tensor products, from a 
Hilbert space H one builds the Fock space 
F(H)= >>) @*H and from there the bosonic 
Fock space F(H)= $77. 4 @*H (where &, stands 
for the (closed) symmetrized tensor product) as well 
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as the fermionic Fock space F(H)= Yo A*H 
(where A^ stands for the antisymmetrized (closed) 
tensor product). 

A prototype of Hilbert space is the space h (Z) of 
complex-valued sequences (un)nez such that 
De lu,|^ is finite, which is already implicit in 
Hilbert's Grundzügen. Shortly afterwords, Riesz and 
Fischer, with the help of the integration tool 
introduced by Lebesgue, showed that the space 
L7(]0,1[) (first introduced by Riesz) of square- 
summable functions on the interval ]0,1[, that is, 
functions f such that 


m X | op te) 


is finite, provides an example of Hilbert space. 
These were then further generalized to spaces 
L^(]0, 1|) of p-summable (1 € p < oc) functionals 
on ]0, 1[ (i.e., functions f such that 


it. = (f MG) te)" 


is finite), which are not Hilbert unless p = 2 but which 
provide further examples of Banach spaces, the space 
L™(]0,1[) of functions on ]0,1[ bounded almost 
everywhere with respect to the Lebesgue measure, 
offering yet another example of Banach space. 

In 1936, Sobolev gave a generalization of the 
notion of function and their derivatives through 
integration by parts, which led to the so-called 
Sobolev spaces W*?(]0,1[) of functions f € 
L^(|0, 1|) with derivatives up to order k lying in 
L^(]0, 1|), obtained as the closure of C™(]0, 1[) for 
the norm 


k | 1/p 
f+ ls = b> "rt 
m1 


(for p=2, W^^(10,1[) is a Hilbert space often 
denoted by H^(]0, 1[). They differ from the Sobolev 
spaces wer (JO, 1D), which correspond to the closure 
of the set D(]0, 1[) for the norm 三 一 | 有 ws for 
example, an element «c W!?(J0,1[) lies in 
W^ (10, 1[) if and only if it vanishes at 0 and 1, 
that is, if and only if it satisfies Dirichlet-type 
boundary conditions on the boundary of the inter- 
val. Similarly, one defines Sobolev spaces 
Wi? (R) = WE?(R) on R, Sobolev spaces W*?(Q) 
and wer? (Q) on open subsets 2 C R" and using a 
partition of unity on a closed manifold M, Sobolev 
spaces H*(M, E) = W**(M, E) of sections of vector 
bundles E over M. Using the Fourier transform 
(discussed later), one can drop the assumption that k 
be an integer and extend the notion of Sobolev space 


to define W*^(Q) and H'(M,E) with s any real 
number. 

Sobolev spaces arise in many areas of mathe- 
matics; one central example in probability theory is 
the Cameron-Martin space H!([0,7]) embedded in 
the Wiener space C([0,7]). This embedding is a 
particular case of more general Sobolev embedding 
theorems, which embed (possibly continuously, 
sometimes even compactly (the notion of compact 
operator is discussed in a later section)) WP. 
Sobolev spaces in L?-spaces with g > p such as the 
continuous ‘ inclusion W'eP(R") C LI(R") with 
1/q—1/p — k/n, or in C'-spaces with 1<k such 
as, for a bounded open and regular enough subset €) 
of R” and for any s>/+n/p with pn, the 
continuous inclusion W*^(Q) c C'(Q) (the set of 
functions in C'(Q) such that D^?» can be continu- 
ously extended to the closure €) for all |o| < I). 
Sobolev embeddings have important applications for 
the regularity of solutions of partial differential 
equations, when showing that weak solutions one 
constructs are in fact smooth. In particular, on an 7- 
dimensional closed manifold M for s > l+ 1/2, the 
Sobolev space H*(M,E) can be continuously 
embedded in the space C'(M, E) of sections of E of 
class C/, which in particular implies that the 
solutions of a hypoelliptic partial differential equa- 
tion Au —v with v € L'(M,E) are smooth, as for 
example in the case of solutions of the Seiberg- 
Witten equations. 


Duality 


The concept of duality (in a topological sense) was 
initiated at the beginning of the twentieth century by 
Hadamard, who was looking for continuous linear 
functionals on the Banach space C(I) of continuous 
functions on a compact interval J equipped with a 
uniform topology. It is implicit in Hilbert's theory 
and plays a central part in Riesz’ work, who 
managed to express such continuous functionals as 
Stieltjes integrals, one of the starting points for the 
modern theory of integration. 

The topological dual of a topological vector space 
E is the space E* of continuous linear forms on E 
which, when E is a normed space, can be equipped 
with the dual norm ||L||rz. = super, jujilL(u)]. 

Dual spaces often provide a receptacle for singular 
objects; any of the functions f € L^(R")(p > 1) and 
the delta-function at point x € R56, :f — f(x), all lie 
in the space S'(R") dual to S(R") of tempered 
distributions on R”, which is itself contained in the 
space D'(R") of distributions dual to D(R”). 
Furthermore, the topological dual E* of a nuclear 
space E contains the support of a probability 


measure with characteristic function (see the next 
section) given by a continuous positive-definite 
function on E. Among nuclear spaces are projective 
limits E = Npen Hp (a sequence (un) € E converges 
to u € E whenever it converges to u in each Hp) of 
countably many nested Hilbert spaces --- C Hp C 
Hp-ı C +-+- C Ho such that the embedding Hp C 
Hy; is a trace-class operator (see the section 
“Operator algebras"). If Hp is the closure of E for 
the norm ||- ||, the topological dual E' of E for the 
norm ||:| is an inductive limit E'— Upen, H. y, 
where H., are the dual (with respect to ||- |lo) 
Hilbert spaces with norm ||- ||, (a sequence (un) € 
E’ converges to u € E' whenever it lies in some H 
and converges to 4 for the topology of H_,) and we 
have 


EC-5GqH,cEH,4C:-:cC Hg 
=F C Ha Cs CHy.e- CE 


As a result of the theory of elliptic operators on a 
closed manifold, the Fréchet space C*(M,E) of 
smooth sections of a vector bundle over a closed 
manifold M is nuclear as the inductive limit of 
countably many Sobolev spaces H?(M,E) with 
L*-dual given by the projective limit of countably 
many Sobolev spaces H "(M,E). 

The existence of nontrivial continuous linear 
forms on a normed linear space E is ensured by the 
Hahn-Banach theorem, which asserts that for army 
closed linear subspace F of E, there is a nonvanish- 
ing continuous linear form that vanishes on F. When 
the space is a Hilbert space (H,(-,-),,), it follows 
from the Riesz-Fréchet theorem that any continuous 
linear form L on H is represented in a unique way 
by a vector v € H such that L(u) — (v,u);, for all 
u € H, thus relating the dual pairing on the left with 
the Hilbert inner product on the right and identify- 
ing the topological dual H* with H. 

The strong topology induced by the norm || || on 
a normed vector space E - that is, the topology in 
which a sequence (un) converges to 4 whenever 
|i, — u|| — 0 — is too refined to have compact sets 
when E is infinite dimensional since the compactness 
of the unit ball in E for the strong topology 
characterizes finite-dimensional spaces. Since com- 
pact sets are useful for existence theorems, one is 
inclined to weaken the topology: the weak topology 
on E — which coincides with the strong topology 
when E is finite dimensional and for which a 
sequence (un) converges to u if and only if L(u,) 一 
L(u) VL € E* — has compact unit ball if and only if E 
is reflexive or, in other words, if E can be canonically 
identified with its double dual (E*)'. For 1 < p < oc, 
given an open subset Q C R”, the topological dual of 
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L^(Q) can be identified via the Riesz representation 
with L” (Q) with p* conjugate to p, that is, 1/p + 
1/p* —1 and L?(Q) is reflexive, whereas the topolo- 
gical duals of W*?(Q) and Wo" (Q) both coincide 
with Wo SP (Q) so that only Ww (Q) is reflexive. 
Neither L'(Q) nor its topological dual L®(Q) is 
reflexive since L'(Q) is strictly contained in the 
topological dual of L^*(Q) for there are continuous 
linear forms L on L*(Q) that are not of the form 


. L(u) =| uv Vu € L*(Q) with v e L'(Q) 
Q 


Similarly, the topological dual E* of a normed 
linear space E can be equipped with the topology 
induced by the dual norm ||: ||;. and the the weak »- 
topology, namely the weakest one for which the 
maps L++L(u),u € E, are continuous, and the unit 
ball in E* is indeed compact for this topology 
(Banach-Alaoglu theorem). 

Duality does not always preserve separability — a 
topological vector space is separable if it has a 
countable dense subspace — since L*(Q), which is 
not separable, is the topological dual of L'(Q), 
which is separable. However, as a consequence of 
the Hahn-Banach theorem, if the topological dual of 
a Banach space is separable then so is the original 
space and one has equivalence when adding the 
reflexivity assumption; a Banach space is reflexive 
and separable whenever its topological dual is. For 
1 «p «oo, L'(Q) and Ws"(Q) are separable and 
moreover reflexive if p Æ 1. 


Fourier Transform 


In the middle of the eighteenth century, oscillations 
of a vibrating string were interpreted by Bernouilli 
as a limit case for the oscillation of m-point masses 
when 7 tends the infinity, and Bernouilli introduced 
the novel idea of the superposition principle by 
which the general oscillation of the string should 
decompose in a superposition of “proper oscilla- 
tions." This point of view triggered off a discussion 
as to whether or not an arbitrary function can be 
expanded as a trigonometric series. Other examples 
of expansions in “orthogonal functions" (this termi- 
nology actually only appears with Hilbert) had been 
found in the mean time in relation to oscillation 
problems and investigations on heat theory, but it 
was only in the nineteenth century, with the works 
of Fourier and Dirichlet, that the superposition 
problem was solved. 

Separable Hilbert spaces can be equipped with a 
countable orthonormal system [(e,],-z ((ens €m) H = 
bmn with (-,-),; the scalar product on H) which is 
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complete, that is, any vector 4 € H can be expanded 
in this system in a unique way u= „ey nen with 
Fourier coefficients „= (u,e,). The latter obey 
Parseval’s relation ^, la, |^ = l|? (where || - || is 
the norm associated with (-,-)), and the Fourier 
transform u> (a(n)),ez gives rise to an isometric 
isomorphism between the separable Hilbert space 
H and the Hilbert space /^(Z) of square-summable 
sequences of complex numbers. In particular, the 
space L?^(S!) of L?-functions on the unit circle 
S'=R/Z with its usual Haar measure di is separ- 
able with complete orthonormal system t> e,(t) = 


e*m n € Z and the Fourier transform 
SN 


~ 1 ; 
uc ( e a(n) =| e mir ar) 
0 NE 


identifies it with the space /^(Z). Under this 
identification, the Hilbert subspace /^(IN) obtained 
as the range in /^(Z) of the projection p. : (u),-7, 上 
(us), ew corresponds to the Hardy space 7^ (S! ). 

The Fourier transform extends to the space S(R"), 
sending a function f € S(R") to the map 


f : “EX f(x) dx 
eH) = rh f(x)d 


and maps S(R") onto itself linearly and continuously 
with continuous inverse f +> f(—£). When n= 1, the 
Poisson formula relates f € S(R) with its Fourier 
transform f by =, f(2nn) = DP. fin). 

Since Fourier transformation turns (up to a 
constant multiplicative factor) differentiation D? 
for a multiindex a= (o4,...,0,) into multiplication 
by €*=€,'---€", it can be used to define W*?- 
Sobolev spaces with s a real number as the space of 
L?-functions with finite Sobolev norms |u||y.; = 
(fI + i£ a£) P (which coincide with the ones 
defined previously when s=k is a non-negative 
integer). 

Fourier transforms are also used to describe a 
linear pseudodifferential operator A (see next two 
sections where the notions of bounded and 
unbounded linear operator are discussed) of order 
a acting on smooth functions on an open subset U 
of R” in terms of its symbol c4 — a smooth map.c 
on U x R" with compact support in x such that for 
any multi-indices a, 8 € INS, there is a constant 
Cap with 


ID? D£o(x, €) < Ca,a(1 + ep ^ 
for any é € R” - by 


(Af )(x) = 


Fourier transform maps a Gaussian function 
xe 0/245 on R^, where A is a nonzero scalar, 
to another Gaussian function €m e-(1/2^ IST (up to 
a nonzero multiplicative factor), a starting point for 
T-duality in string theory. More generally, the 
characteristic function 


nie) J en (der) 
H 


of a Gaussian probability measure jj with covariance 
C on a Hilbert space H is the function 
£c e 0/265 C94. Such probability measures typically 
arise in Euclidean quantum field theory; in axio- 
matic quantum field theory, the analyticity proper- 
ties of m-point functions can be derived from the 
Wightman axioms using Fourier transforms. Thus, 
Fourier transformation underlies many different 
aspects of quantum field theory. 


Fredholm operators 


A complex-valued continuous function K on [0, 1] x 
[0, 1] gives rise to an integral operator 


l 
Afa | K(x, y)f(y) dy 


on complex-valued continuous functions on [0,1] 
(equipped with the supremum norm || - ||,.) with the 
following upper bound property: 


IA f ll; < Supyo,1)xjo,1)/K(% Y) lf lle 


In other words, A is a bounded linear operator with 
norm bounded from above by supio 1,40, 1| K(x, y)]; 
a linear operator A: E — F from a normed linear 
space (E,|| - ||;) to a normed linear space (F,|| - ||;) is 
bounded (or continuous) if and only if its (operator) 
norm |||A||| 2 supj,i «1 ||A4||p is bounded. 

An integral operator 


| 
Afi | K(x, yf ty) dy 


defined by a continuous kernel K is, moreover, 
compact; a compact operator is a bounded operator 
of normed spaces that maps bounded sets to a 
precompact sets, that is, to sets whose closure is 
compact. Other examples of compact operators on 
normed spaces are finite-rank operators, operators 
with finite-dimensional range. In fact, any compact 
operator on a separable Hilbert space can be 
approximated in the topology induced by the 
operator norm |||- || by a sequence of finite-rank 
operators. 

Inspired by the work of Volterra, who, in the case 
of the integral operator defined above, produced 


continuous solutions ó —(I — A) !f of the equation 
f—-(I-—A)ó for f € C([0,1], Fredholm in 1900 
(Sur une classe d'équations fonctionnelles) studied the 
equation f — (I — AA)ó, introducing a complex para- 
meter À. He proved what is since then called the 
Fredholm alternative, which states that either the 
equation f = (I — AA)ó has a unique solution for every 
f € C([0, 1]) or the corresponding homogeneous equa- 
tion (I — AA)ó — 0 has nontrivial solutions. In modern 
language, it means that the resolvent R(A, u) 2 (A 一 
jl) of a compact linear operator A is surjective if and 
only if it is injective. The Fredholm alternative is a 
powerful tool to solve partial differential equations 
among which the Dirichlet problem, the solutions of 
which are harmonic functions u (i.e., Au — 0, where 
A — —7 ,0?u/Ox2) on some domain 2 € R” with 
Dirichlet boundary conditions u), =f, where f is a 
continuous function on the boundary 02. The Dirichlet 
problem has geometric applications, in particular to the 
nonlinear Plateau problem, which minimizes the area of 
a surface in R^ with given boundary curves and which 
reduces to a (linear) Dirichlet problem. 

The operator B=I—A built from the compact 
operator A is a particular Fredholm operator, namely a 
bounded linear operator B: E — F which is invertible 
*up to compact operators," that is, such that there is a 
bounded linear operator C: F — E with both BC — I; 
and CB — Ig compact. A Fredholm operator B has a 
finite-dimensional kernel Ker B and when (E,(-,-);) 
and (F,(-,-)y) are Hilbert spaces its cokernel Ker B*, 
where B* is the adjoint of B defined by 


(Bu,v)g = (u, B*v}ęg Vu € E, vv € F 


is also finite dimensional, so that it has a well- 
defined index ind(B) — dim(Ker B) — dim(Ker B*), a 
starting point for index theory. Tóplitz operators 
T, where @ is a continuous function on the unit 
circle §!, provide first examples of Fredholm 
operators; they act on the Hardy space H?(S') by 


Le x ( ) y, en) 一 ) Amin Em 


m>0 m>0 


under the identification H?(S!) ~ P(N) c P(Z), 
with P(Z) equipped with the canonical complete 
orthonormal basis (e,,2 € Z). The Fredholm index 
ind(T, ,) is exactly the integer n so that the index of 
its adjoint is —71, as a consequence of which the index 
map from Fredholm operators to integers is onto. 


One-Parameter (Semi) groups 


Unlike in the finite-dimensional situation, a linear 
operator A:E — F between two normed linear 
spaces (E,|| - ||;x) and (F,|| - ||) is not expected to be 
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bounded. Unbounded operators arise in partial 
differential equations that involve differential opera- 
tors such as the Laplacian A on an open subset 2 C 
R". The following equations provide fundamental 
examples of partial differential equations which 
arose over time from the study of various problems 
in mathematical physics with the works of Poisson, 
Fourier, and Cauchy: 


Au =0 Laplace equation 


Ot 

m + Au=0O wave equation 
a 

= + Au=0_ heat equation 


and later the Schródinger equation in quantum 
mechanics: 


.Ou 
QU. 
where f is a time parameter. 

An unbounded linear operator on an infinite- 
dimensional normed space is usually defined on a 
domain D(A) which is strictly contained in E. The 
Laplacian A is defined on the dense domain 
D(A) =H?(R”) in L^(R"); it defines a bounded 
operator from H?(R") to L?(R") but does not 
extend to a bounded operator on L^(R"). Like this 
operator, most unbounded operators A: E — F one 
comes across have dense domain D(A) in E and are 
closed, that is, their graph {(u,Au),u € D(A)} is 
closed as a subset of the normed linear space E x F. 
When not actually closed, they can be closable, that 
is, they can have a closed extension called the 
closure of the operator. By the closed-graph theo- 
rem, when E and F are Banach spaces, a linear 
operator A: E — F is continuous whenever its graph 
is closed, as a consequence of which a closed linear 
operator A: E — F defined on a dense domain is 
bounded provided its domain coincides with the 
whole space. 

For a closed operator A:E— F with dense 
domain D(A), when E and F are Hilbert spaces 
equipped with inner products (:,-); and (-,-)p, the 
adjoint A* of A is defined on its domain D(A*) by 


Au 


(Au,v); = (u, A'V)g V(u,v) € D(A) x D(A*) 
A self-adjoint operator A with domain D(A) is one 
for which D(A) 2 D(A*) and A= A*; the Laplacian 
A on R” is self-adjoint on the Sobolev space H?(R") 
but it is only essentially self-adjoint on the dense 
domain D(R"), the latter meaning that its closure is 
self-adjoint. 

Unbounded self-adjoint operators can arise as 
generators of one-parameter semigroups of bounded 
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operators. A one-parameter family of bounded 
operators T;,t > 0 (T;,t € R) on a Hilbert space H 
is a semigroup (resp. group) if TT,= Tr, Vt,s > 0 
(resp. Vt,s € R) and it is strongly continuous (or 
simply continuous) if lim; ,1 T;u = Tpu at any to > 0 
(resp. to € R) and for any u € H. 

Stones’ theorem sets up a one-to-one correspon- 
dence between continuous one-parameter unitary 
(U;U, 2 U;U; 2 I) groups Ur,t ER on a Hilbert 
space such that Up =Id and self-adjoint operators 
A obtained as infinitesimal generators, that is, as the 
strong limit 
P U;u -— 


Au — lim 
1—0 


ucH 


of U,tc€R, which in a compact form reads 
U,—e"^. An important example in quantum 
mechanics is U,—e"P Uo, 2? € R. with H a self- 
adjoint Hamiltonian, which solves the Schródinger 
equation d/dtu —iHu. The Lie-Trotter formula, 
which has important applications for Feynman 
path integrals, expresses the unitary semigroup 
generated by A+B, where A; B, and A+B are 
self-adjoint on their respective domains as a strong 
limit 
eATB) — lim (e*e*y 
t—oo 
On the other hand, positive operators on a 
Hilbert space (H,(:,-),) — that is, A self-adjoint 
and such that (Au,u),;, > 0 Vu € D(A) - generate 
one-parameter semigroups T,;=e“,t>0. Hille 
and Yosida proved that on a Hilbert space, strongly 
continuous contraction (Le. {{|T;||| <1 Wt > 0) 
semigroups such that Ty=Id are in one-to-one 
correspondence with densely defined positive opera- 
tors A: D(A) C H — H that are maximal (i.e., I + A 
is onto), obtained as (minus the) infinitesimal 
generators 
Tu — u 


—Au —lim————, 
t—0 t 


ucH 


of the corresponding semigroups. Similarly, a posi- 
tive densely defined self-adjoint operator A on a 
Hilbert space H gives rise to a densely defined closed 
symmetric sesquilinear form (x, v) —(vAu, VAv) 4 
(see next section for a definition of VA;(-,-),, is the 
scalar product on H) and this map yields a one- 
to-one correspondence between operators and 
sesquilinear forms on H with the aforementioned 
properties, one of the starting points for the theory 
of Dirichlet forms. To a probability measure jz on 
a separable Banach space E, one can associate a 
densely defined closed symmetric sesquilinear form 
(it is in fact a Dirichlet form) on a Hilbert space H 


such that E* C H*=H C E, which in the particular 
case of the standard Wiener measure jz on the 
Wiener space E= C([0,7]) and with Hilbert space 
given by the Cameron-Martin space H = H! ([0, ¢]), 
is the bilinear form 


(u, v) = ] v» Vv) 


with V the (closed) gradient of Malliavin calculus. 
The operator —A, where A is the Laplacian on R", 

generates the heat-operator semigroup e ^, 7 > 0. It 

has a smooth kernel K; € C*(R" x R”) defined by 


(ED) = /Kile Wo)dy vf e F(R”) 
and defines a smoothing operator, an operator that 
maps Sobolev function to smooth function. In 
general, a pseudodifferential operators A on an 
open subset U of R" with symbol c4 only has a 
distribution kernel 


Ka(x.y) = | ela) 


The kernel of the inverse Laplacian (A + my! 
on R” (the non-negative real number m? stands 
for the mass) called Green's function on R”, 
plays an essential role in the theory of Feynman 
graphs. 


Spectral Theory 


Spectral theory is the study of the distribution of the 
values of the complex parameter A for which, given 
a linear operator A on a normed space E, the 
operator A — AI has an inverse and of the properties 
of this inverse when it exists, the resolvent 
R(A, À) 2 (A — A)! of A. The resolvent p(A) of A 
is the set of complex numbers \ for which A — AI is 
invertible with densely defined bounded inverse. The 
spectrum Sp(A) of A is the complement in C of the 
resolvent; it consists of a union of three disjoint sets: 
the set of all complex numbers \ for which A — AI is 
not injective, called the point spectrum — such a A is 
an eigenvalue of A with associated eigenfunction 
any u € D(A) such that Au — Au; the set of points 入 
for which A — AI has a densely defined unbounded 
inverse R(A, A) called the continuous spectrum; and 
the set of points 入 for which A — AI has a well- 
defined unbounded but not densely defined inverse 
R(A, A) called the residual spectrum. 

A bounded operator has bounded spectrum and a 
self-adjoint operator A acting on a Hilbert space has 
real spectrum and no residual spectrum since the 
range of A — Al is dense. As a consequence of the 


Fredholm alternative, the spectrum of a compact 
operator consists only of point spectrum; it is 
countable with accumulation point at 0. A Hamilto- 
nian of a quantum mechanichal system can have 
both point and continuous spectra, but its point 
spectrum is of special interest because the corre- 
sponding eigenfunctions are stationary states of the 
system. As was first pointed out by Kac (*Can you 
hear the shape of a drum?"), the spectrum of an 
operator acting on functions can reflect the geome- 
try of the space these functions are defined on, a 
starting point for many interesting and far-reaching 
questions in differential geometry. 

A self-adjoint linear operator on a Hilbert space 
can be described in terms of a family of projections 
E,, A € R via the spectral representation 


A =| AdE, 
J Sp(A) 


Given a Borel real-valued function f on R, the operator 


f(A) = f(A)dE) 
Sp(A) 
yields another self-adjoint operator. A positive 
operator A on a dense domain D(A) of some Hilbert 
space (H,(-,-),,) has non-negative spectrum and for 
any positive real number t, the map A— e™%^ gives 
the associated bounded heat-operator 


go =| e “dE, 
Sp(A) 


while the map A VA gives rise to a positive 


2 
operator VA such that VA =A. 

The resolvent can also be used to define new 
operators 


f(a) = | FORA, NAA 


from a linear operator via a Cauchy-type integral 
along a countour C around the spectrum; this way 
one defines complex powers A * of (essentially self- 
adjoint) positive elliptic pseudodiffferential opera- 
tors which enter the definition of the zeta-function, 
z—CG(A,z), of the operator A. The ¢-function is a 
useful tool to extend the ordinary determinant to 
C-determinants of self-adjoint elliptic operators, 
thereby providing an ansatz to give a meaning to 
partition functions in the path integral approach to 
quantum field theory. 


Operator Algebras 


Bounded linear operators on a Hilbert space H 
form an algebra £(H) closed for the operator norm 
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with involution given by the adjoint operation 
A — A*; it is a C*-algebra, that is, an algebra over 
C with a norm ||- || and an involution * such that A 
is closed for this norm and such that ||ab|| < |[al|||5]| 
and ll'a|-—|a||? for all a,b € A and by the 
Gelfand-Naimark theorem, every C'-algebra is 
isomorphic to a sub-C*-algebra of some £(H). The 
notion of spectrum extends from bounded opera- 
tors to C*-algebras; the spectrum sp(a) of an 
element a in a C'-algebra A is a (compact) set of 
complex numbers such that 4 一 入 .1 is not inver- 
tible. The notion of self-adjointness also extends 
(a=a*), and just as a self-adjoint operator B € 
£(H) is non-negative (in which case its spectrum 
lies in R*) if and only if B= A*A for some bounded 
operator A, an element b € A is said to be non- 
negative if and only if b=a*a for some a € A, in 
which case sp(a) C Ro. 

The algebra C(X) of continuous functions f : X 一 
C vanishing at infinity on some locally compact 
Hausdorff space X equipped with the supremum 
norm and the conjugation f — f is also a C*-algebra 
and a prototype for abelian C*-algebras, since 
Gelfand showed that every abelian C*-algebra is 
isometrically isomorphic to C(X), with X compact if 
the algebra is unital. To a C'-algebra A, one can 
associate an abelian group Ko(A) which is dual to the 
Grothendieck group K?(X) of isomorphism classes of 
vector bundles over a compact Hausdorff space X. 

Compact operators on a Hilbert space H form 
the only proper two-sided ideal K(H) of the C*- 
algebra £(H) which is closed for the operator norm 
topology on £(H). The quotient L(H)/K(H) is 
called the Calkin space, after Calkin, who classi- 
fied all two-sided ideals in L(H) for a separable 
Hilbert space H; one can set up a one-to-one 
correspondence between such ideals and certain 
sequence spaces. Corresponding to the Banach 
space /! (Z) of complex-valued sequences (un) such 
that 57, |n| < oo, is the *-ideal Z1(H) of trace- 
class operators. The trace tr(A) — 55, 7 (A @ns€n) H 
of a negative operator A € £(H) lies in [0, +00] 
and is independent of the choice of the complete 
orthonormal basis (e,,7 € Z} of H equipped with 
the inner product (:, -),j. Z1(H) is the Banach space 
of bounded linear operators on H such that 
| All, =tr(|A]) is bounded. Given an (esssentially 
self-adjoint) positive differential operator D of 
order d acting on smooth functions on a closed 
n-dimensional Riemannian manifold M, its 
complex power D= is a trace class on the space 
of L?-functions on M provided Re(z) > n/d and the 
corresponding trace tr(D ^) extends to a mero- 
morphic function on the whole plane, the 
C-function ¢(D,z) which is holomorphic at 0. 
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More generally, Banach spaces P (Z), 1 € p « oo, 
of complex-valued sequences (u,),,<7 such that 
> zz < oo relate to Schatten ideals Z,(H), 1 < 
p < oo, where Z,(H) is the Banach space of bounded 
linear operators on H such that ||A||, = (tr(|A|P))'/? 
is bounded. Just as all -sequences converge to 0, 
the Schatten ideals Z,(H) all lie in K(H) and we 
have --- C Zgj1(H) C Z,(H) C -:- C K(H). 

Compact operators and Schatten ideals are 
useful to extend index theory to a noncommuta- 
tive context; a Fredholm module (H, F) over an 
involutive algebra A is given by an involutive 
representation 7 :'of A in a Hilbert space H and 
a self-adjoint bounded linear operator F on H 
such that F*=Idy and the operator brackets 
[F,z(a) are compact for all acA. To a 
p-summable Fredholm module (H, F), that is, 
[F,v(a)] € Z;(H) for all a € A, one associates a 
representative 7 of the Chern character ch'(H, F) 
given by a cyclic cocycle on A, which pairs up with 
K-theory to build an integer-valued index map 7 
on K-theory. 

Schatten ideals are also useful to investigate the 
geometry of infinite-dimensional spaces such as loop 
groups, for which the Hilbert-Schmidt operators 
(operators in Z(H) are also called Hilbert-Schmidt 


operators) are particularly useful. A Holder-type 
inequality shows that the product of two Hilbert- 
Schmidt operators is trace-class. Moreover, for any 
two Hilbert-Schmidt operators A and B, the 
"cyclicity property" that tr(A B) —tr(B A) holds, 
and the sesquilinear form (A, B) — tr(A B*) makes 
£2(H) a Hilbert space. 


Further Reading 


Adams R (1975) Sobolev Spaces. London: Academic Press. 

Dunford N and Schwartz J (1971) Linear Operators. Part I. 
General Theory. Part II. Spectral Theory. Part III. Spectral 
Operators. New York: Wiley. 

Hille E (1972) Methods in. Classical and Functional Analysis. 
London: Academic Press and Addison-Wesley. 

Kato T (1982) A Sport Introduction to Perturbation Tbeory for 
Linear Operators. New York-Berlin: Springer. 

Reed M and Simon B (1980) Metbods of Modern Matbematical 
Physics vols. I-IV, 2nd edn. New York: Academic Press. 

Riesz F and SZ-Nagy B (1968) Lecons d'analyse fonctionnelle. 
Paris: Gauthier-Villars: Budapest Akademiai Kiado. 

Rudin W (1994) Functional Analysis, 2nd edn. New York: 
International Series in Pure and Applied Mathematics. 

Yosida K (1980) Functional Analysis, 6th edn. Die Grundlehren 
der Mathematischen Wissenschaften in Einzeldarstellungen 
Band vol. 132. Berlin-New York: Springer. 


| Introductory Article: Minkowski Spacetime and Special Relativity 


" G L Naber, Drexel University, Philadelphia, PA, USA 
" © 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Minkowski spacetime is generally regarded as the 
appropriate mathematical context within which to 
formulate those laws of physics that do not refer 
specifically to gravitational phenomena. Here we 
shall describe this context in rigorous terms, 
postulate what experience has shown to be its 
correct physical interpretation, and illustrate by 
means of examples its appropriateness for the 
formulation of physical laws. 


Minkowski Spacetime 
and the Lorentz Group 


Minkowski spacetime M is a four-dimensional real 
vector space on which is defined a bilinear form 
g:M x M — R that is symmetric (g(v, w) = g(w, v) 
for all v,w € M) and nondegenerate (g(v,w)=0 


for all w € M implies v= 0). Further, g has index 1, 
that is, there exists a basis {e1, e2, e3,e4} for M with 


| idab-lL2.3 
g(ea, €p) "- Tlab = 一 ] ifa = b= 4 
0 ifagzb 


g is called a Lorentz inner product for M and any 
basis of the type just described is an orthonormal 
basis for M. We shall often write v - w for the value 
g(v,w) of g on (v,w) € M x M. A vector v € M is 
said to be spacelike, timelike, or null if v-v is 
positive, negative, or zero, respectively, and the set 
CN of all null vectors is called the null cone in M. If 
{e1,@2,€3,e4} is an orthonormal basis and if 
we write v —vle, + ve + uez + v^e4 = Ve, (using 
the Einstein summation convention, according to 
which a repeated index, one subscript and one 
superscript, is summed over its possible values) and 
w =w" ep, then 


á 
V- w = vw! 十 vw + vw? — vw 


= Tab V^ w’ 
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Timelike 
Mp Do 


* 2 —— Null 


Spacelike is 


Figure 1 Spacelike, timelike and null vectors. 


In particular, v is null if and only if 


(hence the name null *cone" for Cy). Timelike vectors 
are "inside" the null cone and spacelike vectors are 
“outside” (see Figure 1). 

We select some orientation for the vector space 人 4 
and will henceforth consider only oriented, ortho- 
normal bases for M. From the Schwartz inequality 
for R?, one can show (Naber 1992, theorem 1.3.1) 
that, if v is timelike and w is either timelike or null 
and nonzero, then v - w < 0 if and only if ^w* > 0 
in any orthonormal basis. In particular, one can 
define an equivalence relation on the set of all 
timelike vectors by decreeing that two such, v and 
w, are equivalent if and only if v-w<0. For 
reasons that will emerge shortly we then say that v 
and w have the same time orientation. There are 
precisely two equivalence classes, one of which we 
select and designate future directed. Timelike vectors 
in the other class are then called past directed. One 
can show (Naber 1992, section 1.3 and corollary 
1.4.5) that this classification can be extended to 
nonzero null vectors as well (but not to spacelike 
vectors). We will call an oriented, orthonormal basis 
time oriented if its timelike vector e4 is future 
directed and will consider only these in what 
follows. An oriented, time-oriented, orthonormal 
basis for M will be called an admissible basis. If 
{e1,€2,€3,e4} and {é), @2,@3,@4} are two such bases 
and if we write 


ep 三 Alpe} T A^; T A?563 t A* 64 

= Apa b=I1,2,3,4 [1] 
then the matrix A-—(A?;) (a=row index, 
b —column index) can be shown to satisfy the 
following three conditions (Naber 1992, section 1.3): 


1. (orthogonality) A! 9A =n, 
where T means transpose and 


1 0 
0. 1 
7) EJ (rap) T 0 0 
0 0 


2. (orientability) det A= 1, and 
3. (time orientability) A*4 > 1. 


We shall refer to any 4 x 4 matrix A = (A^;) satisfying 
these three conditions as a Lorentz transformation 
(although one often sees the adjectives *proper" and 
“orthochronous” appended to emphasize conditions 
(2) and (3), respectively). The set C of all such matrices 
forms a group under matrix multiplication that we call 
simply the Lorentz group. It is a simple matter to show 
(Naber 1992, lemma 1.3.4) from the orthogonality 
condition (1) that, if A*4 — 1, then A must be of the 
form 


| 0 
(R'j) 0 
0 


0 Q 0 1 


where (R';) is an element of SO(3), that is, a 3 x 3 
orthogonal matrix with determinant 1. The set & of 
all matrices of this form is a subgroup of £ called 
the rotation subgroup. Although it will play no role 
in what we do here, it should be pointed out that in 
many applications (e.g., in particle physics) it is 
necessary to consider the larger group of transfor- 
mations of M generated by the Lorentz group and 
spacetime translations (x^ — x^ + A^, for some con- 
stants A^,a = 1,2, 3,4). This is called the inhomoge- 
neous Lorentz group, or Poincaré group. 


Physical Interpretation 


For the purpose of describing how one is to think of 
Minkowski spacetime and the Lorentz group physi- 
cally it will be convenient to distinguish (intuitively 
and terminologically, if not mathematically) between a 
“vector” in M and a “point” in M (the “tip” of a 
vector). The points in M are called events and are to be 
thought of as actual physical occurrences, albeit 
idealized as “point events” which have no spatial 
extension and no duration. One might picture, for 
example, an instantaneous collision, or explosion, or 
an “instant” in the history of some point material 
particle or photon (“particle of light"). 

Events are observed and identified by the assign- 
ment of coordinates. We will be interested in 
coordinates assigned in a very particular way by a 
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very particular type of observer. Specifically, our 
admissible observers preside over three-dimensional, 
right-handed, Cartesian spatial coordinate systems, 
relative to which photons always move along 
straight lines in any direction. With a single clock 
located at the origin, such an observer can determine 
the speed, c, of light in vacuo by the so-called Fizeau 
procedure (emit a photon from the origin when the 
clock there reads tı, bounce it back from a mirror 
located at (x!',x?,x?), receive the photon at the 
origin again when the clock there reads t) and set 


c=24/(x!)? + (x2). + (x3)* /(t — t,)). Now place an 
identical clock at each spatial point and synchronize 
them by emitting from the origin a spherical 
electromagnetic wave (photons in all directions) 
and setting the clock whose location is (x!,x?, x?) 


(x1) + (x2)* + (x3) /c at the instant the 
wave arrives. An observer now assigns to an event 
the three spatial coordinates of the location at which 
it occurred in his coordinate system as well as the 
time reading on the clock at that location at the 
instant the event occurred. We shall assume also 
that our admissible observers are inertial in the sense 
of Newtonian mechanics (the trajectory of a particle 
on which no forces act, when described in terms 
of the coordinates just introduced, is a point or a 
straight line traversed at constant speed). It is an 
experimental fact (and quite a remarkable one) that 
all of these admissible observers (whether or not they 
are in relative motion) agree on the numerical value of 
the speed of light in vacuo (c z 3.00 x 10! cms™!). 
We shall exploit this fact at the outset to have all of our 
admissible observers measure time in units of distance 
by simply multiplying their time coordinates t by c. 
The resulting time coordinate is denoted x^ — ct. In 
these units all speeds are dimensionless and the speed 
of light iz vacuo is 1. 

In our mathematical model M of the world of 
events, this very subtle and complex notion of an 
admissible observer is fully identified with the 
conceptually very simple notion of an admissible 
basis [e1,e2,03,€4]. If x € M is an event and. if we 
write x =x“e,, then (x!, x^, x?) are the spatial and x* 
is the time coordinate supplied for x by the 
corresponding observer. If {@),é2,é3,é@4} is another 
basis/observer related to (e1,e2,e3,e4] by [1] and if 
we write x — X^e,, then 


to read 


x? = Mx, a=1,2,3,4 [2] 
Thus, Lorentz transformations relate the space and 
time coordinates supplied for any given event by two 
admissible observers. If (A^;,) € R, then the two 
observers differ only in the orientation of their spatial 


coordinate axes. On the other hand, for any real 
number @ one can define an element L(0) of £ by 


cosh? 0 0 —sinh6 
0 1 0 0 
L(g) 0 0 1 0 [3] 
—sinhü 0 0 cosh 0 


and, if two admissible bases are related by this Lorentz 
transformation, then the coordinate transformation [2] 
becomes 


x! = (cosh 8) x! — (sinh 8) x^ 
x - x? 
ne 4 


x= —(sinh 8) x! + (cosh 0) x* 


Letting 8 = tanh 0 (sothat ^1 < 8 < 1) and suppressing 
X? =x? and x? = x?, one obtains 
1 
x! 


B 4 
VET ou 32 T ra 
x^ i = 


x! 
S 
= si -N + 
2 V1- B2 


1 
3 

vV1-98 
This corresponds to two observers whose spatial 
axes are oriented as shown in Figure 2 with the 
hatted coordinate system moving along the common 
x!-, &l-axis with speed ||, to the right if 8 > 0 and 
to the left if 8 < 0. 

We remark that, reverting to traditional time units, 
B=v/c, where |v| is the relative speed of the two 
coordinate systems, and [5] becomes what is gener- 
ally referred to as a “Lorentz transformation" in 
elementary expositions of special relativity, that is, 


[6] 


x3 x3 


Figure 2 Observers in standard configuration. 
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There is a sense in which, to understand the 
kinematic effects of special relativity, it is enough 
to restrict one's attention to the so-called special 
Lorentz transformations L(0). Specifically, one can 
show (Naber 1992, theorem 1.3.5) that if A € £ is 
any Lorentz transformation, then there exists a real 
number 0 and two rotations R;,R2 € R such that 
A= R L(0)R2. Since R4, and R involve no relative 
motion, all of the kinematics is contained in L(0). 
We shall explore these kinematic effects in more 
detail shortly. 

Now suppose that x and x, are two distinct events 
in M and consider the displacement vector x — xo 
from xo to x. If {e;,e2,e3,e4} is an admissible basis 
and if we write x=x%e, and xo —x$e4, then x — 
xo — (x^ — xg)e; = Ax^e;. If x — xo is null, then 


(Axt) 4- (Aa?) +(Ax3)7= (A) 


so the spatial separation of the two events is equal to 
the distance light would travel during the time lapse 
between the events. The same must be true in any 
other admissible basis since Lorentz transformations 
are the matrices of linear maps that preserve the 
Lorentz inner product. Consequently, all admissible 
observers agree that x9 and x are “connectible by 
a photon." They even agree as to which of the two 
events is to be regarded as the "emission" of the 
photon and which is to be regarded as its *reception" 
since one can show (Naber 1992, theorem 1.3.3) 
that, when a vector is either timelike or null and 
nonzero, the sign of its fourth coordinate is the same 
in every admissible basis (because A*4 > 1). Thus, 
x* — x$ is either positive for all admissible observers 
(xo occurred before x) or negative for all admissible 
observers (xo occurred after x). Since photons move 
along straight lines in admissible coordinate systems 
we adopt the following terminology. If xo, x € M are 
such that x — xo is null, then the straight line in M 
containing xo and x is called the world line of a 
photon in M and is to be thought of as the set of all 
events in the history of some particle of light that 
"experiences" both xo and x. 

Let us now suppose instead that x — xo is timelike. 
Then, in any admissible basis, 


(Ax!) (Ax?) +(Ax’) (Axt) 

so the spatial separation of xo and x is less than the 
distance light would travel during the time lapse 
between the events. In this case, one can prove (Naber 
1992, section 1.4) that there exists an admissible basis 
(61,862,603, 64] in which Ax! = Ax? = AX?’ — 0, that is, 
there is an admissible observer for whom the two 
events occur at the same spatial location, one after the 
other. Thinking of this location as occupied by some 


material object (e.g., the observer's clock situated at 
that point) we find that the events xo and x are both 
"experienced" by this material particle and that, 
moreover, 4/|g(x — xo, x — xo)| is just the time lapse 
between the events recorded by a clock carried along by 
this material particle. To any other admissible observer 
this material particle appears “free” (not subject to 
forces) because it moves on a straight line with constant 
speed. This leads us to the following definitions. If 
Xo,X € M are such that x — xo is timelike, then the 
sttaight line in M containing xo and x is called the 
world line of a free material particle in M and 

|g(x —xo,x — xo) usually written 7(x — xo), or 
simply Ar, is the proper time separation of xo and x. 
One can think of r(x — xo) as a sort of “length” for 
x — xo measured, however, by a clock carried along by 
a free material particle that experiences both xo and x. 
It is an odd sort of length, however, since it satisfies 
not the usual triangle inequality, but the following 
“reversed” version. 


Reversed triangle inequality (Naber 1992, theorem 
1.4.2) Let xo, x and y be events in M for which y — x 
and x — xo are timelike with the same time orientation. 
Then y — xo — (y — x) + (x — xo) is timelike and 


T(y — xo) 2 T(y — x) + T(x — xo) [7] 


with equality holding if and only if y — x and x — xo 
are linearly dependent. 


The sense of the inequality in [7] has interesting 
consequences about which we will have more to say 
shortly. 

Finally, let us suppose that x — xo is spacelike. 
Then, in any admissible basis 


(Ax!) 十 (Ax) 十 (Ax) > (Axt) 


so the spatial separation of xo and x is greater than the 
distance light could travel during the time lapse that 
separates them. There is clearly no admissible observer 
for whom the events occur at the same location. No 
free material particle (or even photon) can experience 
both xo and x. However, one can show (Naber 1992, 
section 1.5) that, given any real number T (positive, 
negative, or zero), one can find an admissible basis 
[(61,02,03,04] in which Ax* — T. Some admissible 
observers will judge the events simultaneous, some 
will assert that xo occurred before x, and others will 
reverse the order. Temporal order, cause and effect, 
have no meaning for such pairs of events. For those 
admissible observers for whom the events are simulta- 
neous (Ax* — 0), the quantity Vg(x — xo, x — xo) is 
the distance between them and for this reason this 
quantity is called the proper spatial separation of xo 
and x (whenever x — x is spacelike). 
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For any two events xo,x € M, g(x — xo, x — xo) is 
given in any admissible basis by (Ax)? + (Ax2)? + 
(Ax3)* — (Axt)? and is called the interval separating 
xo and x. It is the closest analog in Minkowskian 
geometry to the (squared) length in Euclidean 
geometry. It can, however, assume any real value 
depending on the physical relationship between 
the events xo and x. Historically, of course, it was 
the various physical interpretations of this interval 
that we have just described which led Minkowski 
(Einstein et al. 1958) to the introduction of the 
structure that bears his name. 


ES 


Kinematic Effects 


All of the well-known kinematic effects of special 
relativity (the addition of velocities formula, the 
relativity of simultaneity, time dilation, and length 
contraction) follow easily from what we have done. 
Because it eases visualization and because, as we 
mentioned earlier, it suffices to do so, we will limit our 
discussion to the special Lorentz transformations. 

Let 04 and 05 be two real numbers and consider 
the corresponding elements L(0;) and L(05;) of 
L defined by [3]. Sum formulas for sinh0 and 
cosh@ imply that L(01)L(05) — L(04 + 05). Defining 
fj; — tanh 0;,1— 1,2, and B= tanh (04 + 05), the sum 
formula for tanh 0 then gives 


i8] 


The physical interpretation is simple. One has three 
admissible observers whose spatial axes are related 
in the manner shown in Figure 2. If the speed of the 
second relative to the first is 9; and the speed of the 
third relative to the second is 82, then the speed of 
the third relative to the first is not 3; 4-5 as a 
Newtonian predisposition would lead one to expect, 
but rather 3, given by [8]. This is the relativistic 
addition of velocities formula. 

We have seen already that, when the interval 
between xo and x is spacelike, the events will be 
judged simultaneous by some admissible obser- 
vers, but not by others. Indeed, if Ax*=0 
and the observers are related by [5], then Ax* = 
—(8/4/1— 8)Ax! 2 —8A£', which will not be 
zero unless £9 — 0 and so there is no relative motion 
(Ax! cannot be zero since then Ax*=0 for 
a=1,2,3,4 and x-—xo) This phenomenon is 
called the relativity of simultaneity and we now 
construct a simple geometrical representation of it. 

Select two perpendicular lines in the plane to 
represent the x!- and x*-axes (the Euclidean ortho- 
gonality of the lines has no physical significance and 


is unnecessary, but makes the pictures easier to 
draw). The x!-axis will be represented by the 
straight line X* —0 which, from [5], is given by 
x^ = Bx! (in Figure 3 we have assumed that 8 > 0). 
Similarly, the X*-axis is identified with the line 
x* —(1/8)x!. Since Lorentz transformations leave 
the Lorentz inner product invariant, the hyperbolas 
(x1)? — (x^)? =k coincide with ($!) — (24) =k and 
we calibrate the axes accordingly, for example, the 
branch of (x!) —(x*)*=1 with x! > 0 intersects 
the x-axis at the point (x!, x^) — (1, 0) and intersects 
the x!-axis at the point (x!,x*)=(1,0). This 
necessitates a different scale on the hatted and 
unhatted axes, but one can show (Naber 1992, 
section 1.3) that, with this calibration, all coordi- 
nates can be obtained geometrically by projecting 
parallel to the opposite axis (e.g., the x*- and &^- 
coordinates of an event result from projecting 
parallel to the x!- and x!-axes, respectively). 

Thus, a line of simultaneity in the hatted 
(respectively, unhatted) coordinates is parallel to 
the x!- (respectively, x!-) axis so that, in general, a 
pair of events lying on one will not lie on the other 
(note, however, that these lines are “really” three- 
dimensional hyperplanes so what appears to be a 
point of intersection is actually a two-dimensional 
*plane of agreement", any two events in which are 
judged simultaneous by both observers). 

For any two events whatsoever the relationship 
between the time lapse AX* in the hatted coordinates 
and the time lapse Ax* in the unhatted coordinates is, 
from [5], 


A eru Ast. : : 


—— Ax 
J-P vVi-ø 
so the two are generally not equal. Consider, in 


particular, two events on the world line of a point 
at rest in the unhatted coordinate system, for 


"i (x'!*- (x^? 21 


, 


Hatted line of simultaneity 


Unhatted line of simultaneity 


Y (x', x*) 2 (1, 0) 


Figure 3 


Relativity of simultaneity. 
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example, two readings on the clock at rest at the 
origin in this system. Then Ax! =0 so 


1 


———— Ax* > Ax 
1-9 


At 
This effect is entirely symmetrical since, if Aŝ! — 0, 
then [5] implies 

] 

V1- 8? 
Each observer judges the other’s clocks to be 
running slow. This phenomenon is called time 
dilation and is clearly visible in the spacetime 
diagram in Figure 4 (e.g., both observers agree 
on the time reading “0” for the clock at the origin of 
the unhatted system, but the line £*— 1 intersects 
the world line of the clock, i.e., the x^*-axis, at a 
point below (x!, x^) — (0, 1)). 

We should emphasize that this phenomenon is 
quite “real” in the physical sense. For example, 
certain types of elementary particles (mesons) found 
in cosmic radiation are so short-lived (at rest) that, 
even if they could travel at the speed of light, the 
time required to traverse our atmosphere would be 
some ten times their normal life span. They should 
not be able to reach the earth, but they do. Time 
dilation *keeps them young" in the sense that what 
seems a normal life time to the meson appears much 
longer to us. 

Finally, since admissible observers generally 
disagree on which events are simultaneous and 
since the only way to measure the “length” of a 
moving object (say, a measuring rod) is to locate its 
end points “simultaneously,” it should come as no 
surprise that length, like simultaneity, and time, 
depends on the admissible observer measuring it. 
Specifically, let us consider a measuring rod lying 
at rest along the £l-axis of the hatted coordinate 


Ax* = ART > AR’ 


x! (X - (x4)? =—1 


<> 


Figure 4 Time dilation. 


i" y (Y - o6 -1 


(x*, x*) - (1, 0) 


\ (x1, x4)=(1, 0) 


Figure 5 Length contraction. 


system. Its “length” in this coordinate system is A&!. 
The world lines of its end points are two straight 
lines parallel to the <*-axis. If the unhatted observer 
locates two events on these world lines “simulta- 
neously” their coordinates will satisfy Ax* — 0 and, 


by [5] A£! = (1/4/1 — 82)Ax! so 
Ax! = J/1— P Af! < Ax! 


and the moving measuring rod appears contracted in 
its direction of motion by a factor of y1 — 82. As 
for time dilation, this phenomenon, known as length 
contraction, is entirely symmetrical, quite real, and 
clearly visible in a spacetime diagram (Figure 5). 


The Relativity Principle 


We have found that admissible observers can disagree 
about some rather startling things (whether or not two 
events are simultaneous, the time lapse between two 
events even when no one thinks they are simultaneous, 
and the length of a measuring rod). This would be 
a matter of no concern at all, of course, if one could 
determine, in any given situation, who was really 
right. Surely, two events are either simultaneous or 
they are not and we need only sort out which 
admissible observer has the correct view of the 
situation? Unfortunately (or fortunately, depending 
on one's point of view) this distinction between 
the judgments made by different admissible observers 
is precisely what physics forbids. 


The relativity principle (Einstein ez al. 1958). All 
admissible observers are completely equivalent for 
the formulation of the laws of physics. 


We must be clear that this is not a mathematical 
statement. It is rather a statement about the physical 
world around us and how it should be described, 
gleaned from observations, some of which are 
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complex and subtle and some of which are common- 
place (a passenger in a smooth, quiet airplane 
traveling at constant groundspeed cannot "feel" 
his motion relative to the earth). It is a powerful 
guide for constructing the laws of relativistic 
physics, but even more fundamentally it prohibits 
us from regarding any particular admissible observer 
as having a privileged view of the universe. In 
particular, we are forbidden from attaching any 
objective significance to such questions as, “were the 
two supernovae simultaneous?", *How long did the 
meson survive?", and “What is the distance between 
the Crab Nebula, and Alpha Centauri?" This is 
severe, but one must deal with it. 


Particles and 4-Momentum 


If! C R isan interval, thena mapa:/ — M isa curve 
in M. Relative to any admissible basis we can write 


a(&) = x" (£) ea 


for each € € I. We shall assume that a is smooth in 
the sense that each x^(£),4— 1,2,3,4, is infinitely 
differentiable (C*) on I and the velocity vector 


is nonzero for every 上 ET (we adopt the usual 
custom, in a vector space, of identifying the tangent 
space at each point with the vector space itself). This 
definition of smoothness clearly does not depend on 
the choice of admissible basis for M. The curve a is 
said to be spacelike, timelike, or null if 


! dx? dx? 
Oo (£) ‘a (£) m "ab de dé 


is positive, negative, or zero, respectively, for each 
Eel. A timelike curve a for which o'(£) is future 
directed for each £ € I is called a timelike world line 
and its image is identified with the set of all events 
in the history of some (not necessarily free) point 
material particle. If I= [£9,£1] and a:[£9, £1] — M 
is a timelike world line, then the proper time length 
of a is defined by 


x Jewe, ate) dé 


£ 
X* 
J fo 


and interpreted as the time lapse between the events 
a(£o) and a(£i) as recorded by a clock carried along by 
the particle whose world line is a. This interpretation 
is easily motivated by writing out a Riemann sum 


Lio) = 


dx? dx^ d 
—Tlab dé de 


approximation to the integral and appealing to our 
interpretation of the proper time separation 
AT — V —1ap Ax? Ax’. There are subtleties, however, 
both mathematical and physical (Naber 1992, section 
1.4). The mathematical ones are addressed by the 
following result (which combines theorems 1.4.6 
and 1.4.8 of Naber (1992)). 


Theorem Let xo and x be two events in M. Then 
x — xo is timelike and future directed if and only if 
there exists a timelike world line o:[|£9,£1| — M in 
M with o(£9) = xo and a(£1) ^ x and, in this case, 


L(a) < T(x = xo) [9] 


with equality holding if and only if a is a parametriza- 
tion of a timelike straight line. 


The inequality [9] asserts that if two material 
particles experience both xo and x, then the one 
that is free (and so can be regarded as at rest in 
some admissible coordinate system) has longer to 
wait for the occurrence of the second event (moving 
clocks run slow). For many years this basically 
obvious fact was christened *The Twin Paradox." 

Just as a smooth curve in Euclidean space has an 
arc length parametrization, so a timelike world line 
has a proper time parametrization defined as 
follows. For each £ in [£5, £1] let 


T — T(£) = 人 lg(a (ad 


(the proper time length of a from a(o) to a(£)). 
Then 7 — 7(£) has a smooth inverse € = £(7) so o can 
be reparametrized by 7. We will abuse our notation 
slightly and write 
alr) =x" (re 

The velocity vector with this parametrization is 
denoted 

a 
dr 


called the 4-velocity of the world line and is the unit 
tangent vector field to a, that is, 


U(r) - U(r) = -1 (10) 


= Ofer) = 


Ca 


for each 7. An admissible observer is, of course, 
more likely to parametrize a world line by his own 
time coordinate x*. Then 


"TE e dx? dx? 
Cr (x ) 一 di^! mu tds + €4 
SO 


Eee o(*))] 1 - IIVII 
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where 


dx! 2 dx? 2 dx? 2 
iv = (ga) es) * (s) 
is the usual magnitude of the particle's velocity 
vector 


V=Vix") 
dx! dx2 dx? 
"da gt? qat 
一 V'e; 
in the given admissible coordinate system. One finds 
then that 


U= (1 " Ivi?) (V+e 11] 


We shall identify a material particle in M with a 
pair (œ, 71), where a is a timelike world line and m is 
a positive constant called the particle’s proper mass 
(or rest mass). If each dx^/d£,2—1,2,3,4, is 
constant, then (a,m) is a free material particle with 
proper mass m. The 4-momentum of (a,m) is 
defined by P=mU. Thus, 


P. P= -m [12] 


In any admissible basis we write 


P = Fe c= mUe = E os 


dr 
=m(1 一 IVI?) (Ve) [13] 


The “spatial part" of P in these coordinates is 


1 - || v] 


which, for ||V|| < 1, is approximately mV. Identify- 
ing m with the inertial mass of Newtonian 
mechanics (measured by an observer for whom the 
particle’s speed is small), this is simply the classical 
momentum of the particle. Somewhat more expli- 
citly, if one expands 1/ 4/1 — |VI by the Binomial 
Theorem one finds that 


P = m V! 


1 = IVI} 


s% ; 
= mV' +3mV'IVI? +-->, i=1,2,3 [14] 


which gives the components of the classical momen- 
tum plus “relativistic corrections.” In order 
to preserve a formal similarity with Newtonian 


mechanics one often sees m/\/1— |VIl referred 


to as the “relativistic mass” of the particle, but we 
shall avoid this terminology. The fourth component 
of P is given by 


p^ = —P - e4 
1 
= 一 一 一 = m --zm|VIl pe. AS] 
1—H vy 


The appearance of the term (1/2)m||V|^ corre- 
sponding to the Newtonian kinetic energy suggests 
that P^ be denoted E and called the total relativistic 
energy measured by the given admissible observer 
for the particle: 


E=-—P. €4 [16] 


Now, one must understand that the concept of 
“energy” in physics is a subtle one and simply 
giving —P - e4 this name does not ensure that there 
is any physical content. Whether or not the name 
is appropriate can only be determined experimen- 
tally. In particular, one should ask if the appear- 
ance of the term m in [15] is consistent with 
the view that P* represents the “energy” of the 
particle. Observe that if ||V|| — 0 (i.e., if the particle 
is at rest relative to the given observer), then [15] 
gives 

E = m (= mc’, in standard units) [17] 
which we interpret as saying that, even when the 
particle is at rest, it still has energy. If this is really 
*energy" in the physical sense, then it should be 
possible to liberate and use it. That this is, indeed, 
possible has, of course, been rather convincingly 
demonstrated. 

Next we observe that not only material particles, 
but also photons possess “momentum” and 
“energy” and therefore should have 4-momentum 
(witness, e.g., the photoelectric effect in which 
photons collide with and eject electrons from their 
orbits in an atom). Unlike a material particle, 
however, a photon’s characteristic feature is not 
proper mass, but frequency v, or wavelength 
à= 1/v, related to its energy E by € —bv (b being 
Planck's constant) and these are highly observer 
dependent (Doppler effect). There is, moreover, no 
“proper frequency" analogous to “proper mass" 
since there is no admissible observer for whom the 
photon is at rest. In an attempt to model these 
features we consider a point x9 € M, a future 
directed null vector N and an interval I C R. The 
curve a:! — M defined by 


a(£) = xo + EN [18] 
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is a parametrization of the world line of a photon 
through xo. Being null, N can be written in any 
admissible basis as 


N — (-N - ea)(d 4- ea) [19] 
where 
d =|(N ei + (N - e2) 
+N -03)?] ^ [QN ees 


+ (N -e2)e2 + (N- es)es| [20] 
is the direction veetor of the world line in the 
corresponding spatial coordinate system. Now, by 
analogy with [16], we define a photon in M to 
be a curve in M of the form [18], take N to be its 
4-momentum and define the energy € of the photon 
in the admissible basis {e1,e2,e3,e4} by 


上 ——N- e4 [21] 
Then, by [19], 
N = E(d + e4) [22] 


The corresponding frequency v and wavelength A 
are then defined by v — £/b and A — 1/v. In another 
admissible basis, one has N—£(d + &4), where d 
and £ are defined by the hatted versions of [20] and 
[21]. One can then show (Naber 1992, section 1.8) 
that 


Vi- 
= (1 — Beosb) 4-5 (1 — Bcos0) +- [23] 


where 8 is the relative speed of the two spatial 
coordinate systems and 0 is the angle (in the 
unhatted spatial coordinate system) between the 
direction d of the photon and the direction of 
motion of the hatted spatial coordinate system. 
Equation [23] is the formula for the relativistic 
Doppler effect with the first term in the series being 
the classical formula. 

We conclude this section by examining a few 
simple interactions between particles of the sort 
modeled by our definitions, assuming only that 
4-momentum is conserved in the interaction. For 
convenience, we will use the term free particle to 
refer to either a free material particle or a photon. 
If .A is a finite set of free particles, then each 
element of .A has a unique 4-momentum which is a 
future-directed timelike or null vector. The sum of 
any such collection of vectors is timelike and future 
directed, except when all of the vectors are null and 


parallel, in which case the sum is null and future 
directed (Naber 1992, lemma 1.4.3). We call this 
sum the total 4-momentum of .4. Now we formulate 
a definition which is intended to model a finite set 
of free particles colliding at some event with a 
(perhaps new) set of free particles emerging from the 
collision (e.g., an electron and proton collide, with a. 
neutron and neutrino emerging from the collision). 
A contact interaction in M is a triple (A,x, A), 
where A and A are two finite sets of free particles, 
neither of which contains a pair of particles with 
linearly dependent 4-momenta (which would pre- 
sumably be physically indistinguishable) and x € 人 4 
is an event such that 


1. x is the terminal point of all of the particles in .4 
(i.e., for each world line oa:[£9,£1] ^ M of a 
particle in A, a(&) =x); | 

2. x is the initial point of all the particles in A, and 

3. the total 4-momentum of .A equals the total 
4-momentum of .A. 


Properly (3) is called the conservation of 4-momentum. 
If A consists of a single free particle, then (A, x, A) is 
called a decay (e.g., a neutron decays into a proton, an 
electron and an antineutrino). | 

Consider, for example, an interaction (.A, x,.A) 
for which .A consists of a single photon. The total 
4-momentum of .A is null so the same must be true of 
A. Since the 4-momenta of the individual particles in 
A are timelike or null and future directed their sum 
can be null only if they are, in fact, all null and 
parallel. Since .4 cannot contain distinct photons with 
parallel 4-momenta, it must consist of a single photon 
which, by (3), must have the same 4-momentum as 
the photon in A. In essence, “nothing happened at 
x." We conclude that no nontrivial interaction of the 
type modeled by our definition can result in a single 
photon and nothing else. Reversing the roles of A 
and .A shows that, if 4-momentum is to be conserved, 
a photon cannot decay. 

Next let us consider the decay of a single material 
particle into two material particles, for example, the 
spontaneous disintegration of an atom through 
a-emission. Thus, we consider a contact interaction 
(A, x, A) in which A consists of a single free material 
particle of proper mass mo and A consists of two 
free material particles with proper masses m1 and 
m7. Let Po, Pi, and P be the 4-momenta of the 
particles of proper mass mo, mı, and my, respec- 
tively. Then Pg-—P,--P;. Appealing to the 
*reversed triangle inequality," the fact that P, and 
P; are linearly independent and future directed, and 
[12] we conclude that 


mo > mj +m [23] 
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The excess mass my 一 (mı +m) of the initial 
particle is regarded, via [17], as a measure of the 
amount of energy required to split mo into two 
pieces. Stated somewhat differently, when the two 
particles in .A were held together to form the single 
particle in A, the “binding energy" contributed to 
the mass of this latter particle. : 

Reversing the roles of .A and .A in the last 
example gives a contact interaction modelling an 
inelastic collision (two free material particles with 
masses mı and m» collide and coalesce to form a 
third of mass 7:5). The inequality [23] remains true, 
of course, and a somewhat more detailed analysis 
(Naber 1992, section 1.8) yields an approximate 
formula for mo — (m; +m) which can be com- 
pared (favorably) with the Newtonian formula for 
the loss in kinetic energy that results from the 
collision (energy which, classically, is viewed as 
taking the form of heat in the combined particle). 
An analysis of the interaction in which both .4 and 
A consist of an electron and a photon yields (Naber 
1992, section 1.8) a formula for the so-called 
Compton effect. Many more such examples of this 
sort are treated in great detail in Synge (1972, 
chapter VI, § 14). 


Charged Particles and Electromagnetic 
Fields 


A charged particle in M is a triple (0,72, 4), where 
(a,m) is a material particle and q is a nonzero real 
number called the charge of the particle. Charged 
particles do two things of interest to us. By their 
very presence they create electromagnetic fields and 
they also respond to the electromagnetic fields 
created by other charges. 

Charged particles “respond” to an electromag- 
netic field by experiencing changes in 4-momentum. 
The quantitative nature of this response, that is, the 
equation of motion, is generally taken to be the 
so-called Lorentz 4-force law which expresses 
the proper time rate of change of the particle's 
4-momentum at each point of the world line as a 
linear function of the 4-velocity. Thus, at each point 
a(t) of the world line 


dP(r) 
dr 


= qF 7)(U(r)) [24] 


where Fa: M — M is a linear transformation 
determined, in each admissible coordinate system, 
by the classical electric E and magnetic B fields (here 
we are assuming that the contribution of q to the 
ambient electromagnetic field is negligible, that is, 


(a,m,q) is a “test charge"). Let us write [24] more 
simply as 


EUj--—— [25] 


KU).U - T4--U c j^ (UU) 
m d 


Since any future-directed timelike unit vector z is 
the 4-velocity of some charged particle, we find 
that F(u) -u=0 for any such vector. Linearity then 
implies F(v)-v=0 for any timelike vector. Now, 
if u and v are timelike and future directed, then u + v 
is timelike so 0 = F(u +v):(u+v)= F(u) - v+ 
u-F(v) and therefore F(u)-v= —u-F(v). But M 
has a basis of future-directed timelike vectors so 


F(x) - y = —x- F(y) [26] 


for all x,y € M. Thus, at each point, the linear 
transformation F must be skew-symmetric with 
respect to the Lorentz inner product. One could 
therefore model an electromagnetic field on M by 
an assignment to each point of a skew-symmetric 
linear transformation whose job it is to assign to the 
4-velocity of a charged particle whose world line 
passes through that point the change in 4-momen- 
tum that the particle should expect to experience 
because of the presence of the field. However, a 
slightly different perspective has proved more con- 
venient. Notice that a skew-symmetric linear trans- 
formation F:M — M and the Lorentz inner 
product together determine a bilinear form F: M x 
M — R given by 


~ 


F(x,y) = F(x) y 


which is also skew-symmetric (F(y, x) = F(y) -x= 
—F(x,y)) and that, conversely, a skew-symmetric 
bilinear form uniquely determines a skew-symmetric 
linear transformation. Now, an assignment of a 
skew-symmetric bilinear form to each point of M is 
nothing other than a 2-form on M and it is in the 
language of forms that we choose to phrase classical 
electromagnetic theory (a concise introduction to 
this language is available, for example, in Spivak 
(1965, chapter 4). 

Nature imposes a certain restriction on which 
2-forms can reasonably represent an electromagnetic 
field on M (“Maxwell’s equations"). To formulate 
these we introduce a source 1-form J as follows: If 


106 Introductory Article: Minkowski Spacetime and Special Relativity 


xl, x4, x3, x* 


M, then 


is any admissible coordinate system on 


J-Jdx +Jjdr +Jadx? — pdx* [27 


where p:.M — R is a charge density function and 
J — hei  J2e2 + J3e3 is a current density vector field 
(these are to be regarded as the usual *smoothed 
out," pointwise versions of "charge per unit 
volume" and “charge flow per unit area per unit 
time" as measured by the corresponding admissible 
observer). Now, our formal definition is as follows: 
The electromagnetic field on M determined by the 
source 1-form J on, M is a 2-form F on M that 
satisfies Maxwell's equation 


dF-0 [28] 
and 
‘or s] [29] 


A few comments are in order here. We have chosen 
units in which not only the speed of light, but also 
various other constants that one often finds in 
Maxwell’s equations (the dielectric constant co and 
magnetic permeability jo) are 1 and a factor of 47 in 
[29] is “normalized out.” The * in [29] is the Hodge 
star operator determined by the Lorentz inner 
product and the chosen orientation of M. This is a 
natural isomorphism 


t: QPCAD 4 0*?(M), p —0,1,2,3,4 


of the p-forms on .M to the (4 — p)-forms on .M and is 
most simply defined as follows: let x! , x?, x?, x* be any 
admissible coordinate system on M. If 1 € 2°(M) 
is the constant function (0-form) on M whose value 
is 1 € R, then 


*1 = dx! A dx^ A dx? ^ dx 
is the volume form on M. If 1 € ij «--- «i, <4, 
then *(dx^ A --- A dx**) is uniquely determined by 
(dx A --- A dx") A* (dx^ ^--- ^ dx") 
= —dx! ^ dx? ^ dx? ^ dx* 

Thus, for example, *dx? — dx! A dx? A dx*, *(dx* ^ 
dx^)- —dx? Adx*, *(dx! A dx” A dx? A dx*) 2 —1, 
etc. It follows that, if j is a p-form on M, then 

*u= (-1f*'u [30] 


(a more thorough discussion is available in Choquet- 
Bruhat et al. (1977, chapter V A3)). In particular, 
[29] is equivalent to 


F=” [31] 


On regions in which there are no charges, so that 
J — 0, [28] and [31] become the source free Maxwell 
equations 


dF = 0 [32] 
and 
d*F=0 [33] 


that is, both F and *F are closed 2-forms. 

Any 2-form F on M can be written in any admissible 
coordinate system as F=(1/2)F,,dx* A dx’ (summa- 
tion convention!), where (F,;) is the skew-symmetric 
matrix of components of F. In order to make contact 
with the notation generally employed in physics, we 
introduce the following names for these components: 


0 B -BP E 
-B o B E 


(Fab) = Po.oB o E [34] 
—-B' -F -E* 0 
Thus, 
F = E!dx! ^ dx* + E?dx? ^ dx* 
+ E?dx? ^ dx* 十 B dx ^ dx? 
+ B?dx? ^ dx! + B'dx* A dx? [35] 
Computing 'F,dF,d'F and 'd'F and writing 


E= Ele, + E?e + E?e3 and B= B'e, + B?e; + Bre; 
one finds that dF — 0 is equivalent to 


divB = 0 [36] 
and 
curl E + = = [37] 
while *d F =] is equivalent to 
div E = p [38] 
and 
curl B — = =J [39] 


Equations [36]-[39] are the more traditional render- 
ings of Maxwell's equations. 

In another admissible coordinate system 
x', 7,59, &^ on M (related to the first by [2]) the 
2-form F would be written F=(1/2)F,,dx* ^ dâ’. 
Setting %7=A%,x* and  &^—A'"sx^ gives 
F — (1/2)( A44 AP 5F, i) dx? ^ dx’, so 


Fas = A'a A’ aF, a,8=1,2,3,4 [40] 


Now, suppose that we wish to describe the electro- 
magnetic field of a uniformly moving charge. 
According to the relativity principle, it does not 
matter at all whether we view the charge as moving 
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relative to a "fixed" admissible observer, or the 
observer as moving relative to a “stationary” charge. 
Thus, we shall write out the field due to a charge 
fixed at the origin of the hatted coordinate system 
(*Coulomb's law") and transform, by [40], to an 
unhatted coordinate system moving relative to it. 
Relative to X!, X^, &?, X^, the familiar inverse square 
law for a fixed point charge q located at the spatial 
origin gives B— 0 and E= (q/?^)r, where ? =X161 + 
x7) +5323; and $= ((4!) + (x2) + ($3)")"^ (note 
that E is defined only on M — Span{é4}). Thus, 


0 0 0 


x! 
0 0 0 ££ 
X 


N 


;| 41] 
-à! -% -3 0 


It is a simple matter to verify that, on its domain, (Fi) 
satisfies the source free Maxwell equations. Taking A to 
be the special Lorentz transformation corresponding to 
[5] and writing out [40] with (Fap) given by [41] yields 


) 

23 

3 [42] 
) 


We wish to express these in terms of measurements 
made by the unhatted observer at the instant the 
charge passes through his spatial origin. Setting 
x* =0 in [5] gives 


and so 


which, for convenience, we write 72. Making these 
substitutions in [42] gives 


os (3) (xlel + x ez + xez) 


JA — g r2 
28 ft 
sm (5) [43] 


and 


1 


B= JE (5) (0e; = Bx? ez F Bx*e3) 
be à 
- XE (5) ((Be1) x r) [44] 
ES 3 


for the field of a charge moving uniformly with 
velocity Ge, at the instant the charge passes through 
the origin. Observe that when 9 < 1,7; ~ r, so [43] 
says that the electric field of a slowly moving charge 
is approximately the Coulomb field. When 5 < 1, 
[44] reduces to the Biot-Savart law. 

Let us consider one other simple application, that 
is, the response of a charged particle (a,m,q) to an 
electromagnetic field which, for some admissible 
observer, is constant and purely magnetic. For 
simplicity, we assume that, for this observer E=0 
and B=be3, where b is a nonzero constant. The 
corresponding 2-form F has components 


0 b 0 0 

-b 00 0 

(Fap) 0.00 0 
0 0 0 0 


(from [34]). The corresponding linear transforma- 
tion F has the same matrix relative to this basis so, 
with a(t) 2 x^(r)e, and U(r) = U*(r)e,, the Lorentz 
4-force law [25] reduces to the system of linear 
differential equations 


dU" _ ba y2 dU? _ _ba yn 
dr m dr m 
dus dU 

dr ^ dr 


The system is easily solved and the results easily 
integrated to give 


alt) =X + asin ("E+ 2r 
m 
tacos (E+ óje 
m 
ag 


m2 


+ cres + (1 + + 2 )res [45] 
where xp = x$e;4 € M is constant and a, ó, and c are 
real constants with a > 0 (we have used U- U= —1 
to eliminate one other arbitrary real constant). Note 
that, at each point on a, (x! — x1)? + (x2 — x3)? =a. 
Thus, if c #0 the spatial trajectory in this coordi- 
nate system is a helix along the e;-direction 
(i.e., along the magnetic field lines). If c— 0, the 
trajectory is a circle in the x'—x? plane. This case 
is of some practical significance since one can 
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introduce constant magnetic fields in a bubble 
chamber so as to induce a particle of interest to 
follow a circular path. We show now how to 
measure the charge-to-mass ratio for such a particle. 
Taking c — 0 in [45] and computing U(7), then using 
[11] to solve for the coordinate velocity vector V of 
the particle gives 


1 - ivi" T 


+ sin eu 十 ó) «) 


» 


From this one computes 


2 一 1 
2 nm 
vr- (1+ agaga) 
(note that this is a constant). Solving this last equation 


for g/m (and assuming q > 0 for convenience) one 
arrives at 


q 1 W 
2 1 - |V]? 


a|b| 


Since a, b, and ||V|| are measurable, one obtains the 
desired charge-to-mass ratio. 

To conclude we wish to briefly consider the 
existence and use of “potentials” for electromagnetic 
fields. Suppose F is an electromagnetic field defined 
on some connected, open region X in M. Then F is 
a 2-form on X which, by [28], is closed. Suppose 
also that the second de Rham cohomology H?(X ; R) 
of X is trivial (since M is topologically R* this will 
be the case, for example, when X is all of M, or an 
open ball in M, or, more generally, an open “star- 
shaped” region in M). Then, by definition, every 
closed 2-form on X is exact so, in particular, there 
exists a 1-form A on X satisfying 


F — dA [46 


In particular, such a 1-form A always exists locally 
on a neighborhood of any point in X for any F. Such 
an A is not uniquely determined, however, because, 
if A satisfies [46], then so does 4 十 df for any 
smooth real-valued function (0-form) f on X (d? —0 
implies d(A + df) 2 dA + df — dA =F). Any 1-form 
A satisfying [46] is called a (gauge) potential for F. 
The replacement A — A + df for some f is called a 
gauge transformation of the potential and the 
freedom to make such a replacement without 
altering [46] is called gauge freedom. 

One can show that, given F, it is always possible 
to locally solve dA — F for A subject to an arbitrary 
specification of the 0-form *d* A. More precisely, if F 


is any 2-form satisfying dF — 0 and g is an arbitrary 
0-form, then locally, on a neighborhood of any 
point, there exists a 1-form A satisfying 


dA —F and “dA=g [47] 


(a more general result is proved in Parrott (1987, 
appendix 2) and a still more general one in section 
2.9 of this same source). The usefulness of the 
second condition in [47] can be illustrated as 
follows. Suppose we are given some (physical) 
configuration of charges and currents (i.e., some 
source 1-form /) and we wish to find the corre- 
sponding electromagnetic field F. We must solve 
Maxwell's equations dF — 0 and *d*F =J (subject to 
whatever boundary conditions are appropriate). 
Locally, at least, we may seek instead a correspond- 
ing potential A (so that F=dA). Then the first of 
Maxwells equations is automatically satisfied 
(dF=d(dA)=0) and we need only solve 
*d*(dA) =]. To simplify the notation let us tempora- 
rily write 6=*d* and consider the operator A= 
doé+6od on forms (variously called the Laplace- 
Beltrami operator, Laplace-de Rham operator, or 
Hodge Laplacian on Minkowski spacetime). Then 


AA = d(6A) + 6(dA) = d(*d*A) --*d'(dA) [48] 


According to the result quoted above, we may 
narrow down our search by imposing the condition 
*d* A — 0, that is 


6A —0 [49] 


(this is generally referred to as imposing the Lorentz 
gauge). With this, [48] becomes AA=*d*(dA) and 
to satisfy the second Maxwell equation we must 
solve 


AA =J [50] 


Thus, we see that the problem of (locally) solving 
Maxwell's equations for a given source / reduces 
to that of solving [49] and [50] for the potential A. 
To understand how this simplifies the problem, we 
note that a calculation in admissible coordinates 
shows that the operator A reduces to the compo- 
nentwise d'Alembertian O, defined on real-valued 
functions by 


$ | 9 9 E 
A(x2)? BO(x3)  OG(x^y 


(x!) 
Thus, eqn [50] decouples into four scalar equations 
DA = @= 1,2,3,4 [51] 


each of which is the well-studied inhomogeneous 
wave equation. 
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Historical Background 


In this section we shall briefly recall the basic 
empirical facts and the first theoretical attempts 
from which the theory and the formalism of present- 
day quantum mechanics (QM) has grown. In the 
next sections we shall give the mathematical and 
computational structure of QM, mention the physi- 
cal problems that QM has solved with much 
success, and describe the serious conceptual consis- 
tency problems which are posed by QM (and which 
remain unsolved up to now). 

Empirical rules of discretization were observed 
already, starting from the 1850s, in the absorption 
and in the emission of light. Fraunhofer noticed 
that the dark lines in the absorption spectrum of 
the light of the sun coincide with the bright lines in 
the emission lines of all elements. G Kirchhoff and 
R Bunsen reached the conclusion that the relative 
intensities of the emission and absorption of light 
implied that the ratio between energy emitted and 
absorbed is independent of the atom considered. 
This was the starting point of the analysis by 
Planck. 

On the other hand, by the end of the eighteenth 
century, the spatial structure of the atom had been 
investigated; the most successful model was that of 
Rutherford, in which the atom appeared as a small 
nucleus of charge Z surrounded by Z electrons 
attracted by the nucleus according to Coulomb’s 
law. This model represents, for distances of the 
order of the size of an atom, a complete departure 
from Newton’s laws combined with the laws of 
classical electrodynamics; indeed, according to these 
laws, the atom would be unstable against collapse, 
and would certainly not exhibit a discrete energy 
spectrum. We must conclude that the classical laws 


are inadequate for the description of emission and 
absorption of light, in which the internal structure of 
the atom plays a major role. 

The birth of the old quantum theory is placed 
traditionally at the date of M Planck’s discussion of 
the blackbody radiation in 1900. 

Planck put forward the postulate that light is 
emitted and absorbed by matter in discrete energy 
quanta through “resonators” that have an energy 
proportional to their frequency. This assumption 
led, through the use of Gibb’s rules of Statistical 
Mechanics applied to a gas of resonators, to a law 
(Planck’s law) which reproduces the empirical 
findings on the radiation from a blackbody. It led 
Einstein to ascribe to light (which had, since the 
times of Maxwell, a successful description in terms 
of waves) a discrete, particle-like nature. Nine years 
later A Einstein gave further support to Planck’s 
postulate by showing that it can reproduce correctly 
the energy fluctuations in blackbody radiation and 
even clarifies the properties of specific heat. Soon 
afterwards, Einstein (1924, 1925) proved that the 
putative particle of light satisfied the relativistic laws 
(relation between energy and momentum) of a 
particle with zero mass. 

This dual nature of light received further support 
from the experiments on the Compton effect and 
from description, by Einstein, of the photoelectric 
effect (Einstein 1905). It should be emphasized 
that while Planck considered with light in interaction 
with matter v as composed of bits of energy hy (h ~ 
6,6 x 107" ergs), Einstein's analysis went much 
further in assigning to the quantum of light properties 
of a particle-like (localized) object. This marks a 
complete departure from the laws of classical electro- 
magnetism. Therefore, quoting Einstein, 


It is conceivable that the wave theory of light, which 
retains its effectiveness for the representation of purely 
optical phenomena and is based on continuous functions 
over space, will lead to contradiction with the experiments 
when applied to phenomena in which there is creation or 
conversion of light; indeed these phenomena can be better 
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described on the assumption that light is distributed 
discontinuously in space and described by a finite number 
of quanta which move without being divided and which 
must be absorbed or emitted as a whole. 


Notice that, for wavelength of 8x10?À, a 30 W 
lamp emits roughly 102° photons s^; for macro- 
scopic objects the discrete nature of light has no 
appreciable consequence. 

Planck's postulate and energy conservation imply 
that in emitting and absorbing light the atoms of the 
various elements can lose or gain energy only by 
discrete amounts. Therefore, atoms as producers or 
absorbers of radiation are better described by a 
theory that assigns to each atom a (possible infinite) 
discrete set of states which have a definite energy. 

The old quantum theory of matter addresses 
precisely this question. Its main proponent is 
N Bohr (Bohr 1913, 1918). The new theory is 
entirely phenomenological (as is Planck's theory) 
and based on Rutherford's model and on three 
more postulates (Born 1924): 


(i) The states of the atom are stable periodic 
orbits, as given by Newton's laws, of energy 
E,,n € Z*, given by E,— bv,f(n), where b is 
Plank's constant, v, is the frequency of the 
electron on that orbit, and f(z) is for each atom 
a function approximately linear in Z at least for 
small values of Z. 

(ii) When radiation is emitted or absorbed, the 

atom makes a transition to a different state. 

The frequency of the radiation emitted or 

absorbed when making a transition is 

Uy m =PE, m Bal. 

For large values of n and m and small values of 

(n — m)/(n-- m) the prediction of the theory 

should agree with those of the classical theory 

of the interaction of matter with radiation. 


— 


(111 


Later, A Sommerfeld gave a different version of the 
first postulate, by requiring that the allowed orbits 
be those for which the classical action is an integer 
multiple of Planck's constant. 

The old quantum theory met success when 
applied to simple systems (atoms with Z « 5) but 
it soon appeared evident that a new, radically 
different point of view was needed and a fresh 
start; the new theory was to contain few free 
parameters, and the role of postulate (iii) was now 
to fix the value of these parameters. 

There were two (successful) attempts to construct 
a consistent theory; both required a more sharply 
defined mathematical formalism. The first one was 
sparked by W Heisenberg, and further important 
ideas and mathematical support came from M Born, 


P Jordan, W Pauli, P Dirac and, on the mathema- 
tical side, also by J von Neumann and A Weyl. This 
formulation maintains that one should only consider 
relations between observable quantities, described 
by elements that depend only on the initial and final 
states of the system; each state has an internal 
energy. By energy conservation, the difference 
between the energies must be proportional (with a 
universal constant) to the frequency of the radiation 
absorbed or emitted. This is enough to define the 
energy of the state of a single atom modulo an 
additive constant. The theory must also take into 
account the. probability of transitions under the 
influence of an external electromagnetic field. 

We shall give some details later on, which will 
help to follow the basis of this approach. 

The other attempt was originated by L de Broglie 
following early remarks by HW Bragg and 
M Brillouin. Instead of emphasizing the discrete 
nature of light, he stressed the possible wave nature 
of particles, using as a guide the Hamilton—Jacobi 
formulation of classical mechanics. This attempt 
was soon supported by the experiments of Davisson 
and Germer (1927) of scattering of a beam of ions 
from a crystal. These experiments showed that, 
while electrons are recorded as *point particles," 
their distribution follows the law of the intensity for 
the diffraction of a (dispersive) wave. Moreover, the 
relation between momentum and frequency was, 
within experimental errors, the same as that 
obtained by Einstein for photons. 

The theory started by de Broglie was soon placed 
in almost definitive form by E Schródinger. In this 
approach one is naturally led to formulate and solve 
partial differential equations and the full develop- 
ment of the theory requires regularity results from 
the theory of functions. 

Schródinger soon realized that the relations which 
were found in the approach of Heisenberg could be 
easily (modulo technical details which we shall 
discuss later) obtained within the formalism he was 
advocating and indeed he gave a proof that the two 
formalisms were equivalent. This proof was later 
refined, from the mathematical point of view, by 
J von Neumann and G Mackey. 

In fact, Schrédinger’s approach has proved much 
more useful in the solution of most physical 
problems in the nonrelativistic domain, because it 
can rely on the developments and practical use of 
the theory of functions and of partial differential 
equations. Heisenberg’s “algebraic” approach has 
therefore a lesser role in solving concrete problems 
in (nonrelativistic) QM. 

If one considers processes in which the number of 
particles may change in time, one is forced to 
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introduce a Hilbert space that accommodates states 
with an arbitrarily large number of particles, as is 
the case of the theory of relativistic quantized field 
or in quantum statistical mechanics; it is then more 
difficult to follow the line of Schródinger, due to 
difficulties in handling spaces of functions of 
infinitely many variables. The approach of Heisen- 
berg, based on the algebra of matrices, has a rather 
natural extension to suitable algebras of operators; 
the approach of Schródinger, based on the descrip- 
tion of a state as a (wave) function, encounters more 
difficulties since one must introduce functionals over 
spaces of functions and the description of dynamics 
does not have a simple form. 

From this point of view, the generalization of 
Heisenberg's approach has led to much progress in 
the understanding of the structure of the resulting 
theory. Still some relevant results have been 
obtained in a Schródinger representation. We shall 
not elaborate further on this point. 

We shall end this introductory section with a 
short description of the emergence of the structure 
of QM in Heisenberg’s and  Schródinger's 
approaches; this will provide a motivation for the 
axiom of QM which we shall introduce in the 
following section. For an extended analysis, see, for 
example, Jammer (1979). 

The specific form that was postulated by 
de Broglie (1923) for the wave nature of a particle 
relies on the relation of geometrical optics with 
wave propagation and on the formulation of 
Hamiltonian mechanics as a sort of “wave front 
propagation" through the solution of the Hamilton- 
Jacobi equation and the introduction. of group 
velocity. 

By the analogy with electromagnetic wave, it is 
natural to associate with a free nonrelativistic 
particle of momentum p and mass m the plane wave 


dp(x, f) = etx-E*. h= po P- 


Schródinger obtained the equation for a quantum 
particle in a field of conservative forces with 
potential V(x) by considering an analogy with the 
propagation of an electromagnetic wave in a 
medium with refraction index m(x,w) that varies 
slowly on the scale of the wavelength. Indeed, in this 
case the “wave” follows the laws of geometrical 
optics, and has therefore a “particle-like” behavior. 
If one denotes by £(x,c) the Fourier transom (with 
respect to time) of a generic component of the 
electric field and one assumes that the field be 
essentially monochromatic (so that the support of 
4(x,w) as a function of w is in a very small 


neighborhood of wo), one finds that U(x,w) is an 
approximate solution of the equation 


—Aáü(x,w) = — n (x,w)ü(x,w) [1] 


eu EN 


Writing u(x, w) = A(x,w)e'»/9W652 the phase 
W(x,w) satisfies, in the high-frequency limit, the 
eikonal equation IV W(x, w)|? =n*(x,w). One can 
define for the solution a phase velocity v; and it 
turns out that v; =c/|V W(x, w)|- 

On the other hand, classical mechanics can also be 
described by propagation of surfaces of constant value 
for the solution W(x, t) of the Hamilton-Jacobi 
equation H(x, VW) — E, with H—p?^/2m + V(x). 
Recall that high-frequency (the realm of geometric 
optics) corresponds to small distances. This analogy 
led Schródinger (1926) to postulate that the dynamics 
satisfied by the waves associated with the particles was 
given by the (Schrödinger) equation 


; 2 

ip Et) 三 一 Z Ale, t) + V(x)v(x,t) [2] 
This wave was to describe the particle and its motion, 
but, being complex valued, it could not represent any 
measurable property. It is a mathematical property of 
the solutions of [2] that the quantity f |v(x, 让 | d?x is 
preserved in time. Furthermore, if one sets 


p(x,t) = |o, DF 

"p " 

f(x, t) = i5 Vt) Vel, t) — vix, tvv(x.t)) [3] 
one easily verifies the local conservation law 


ep + div j(x,t) — 0 [4] 
Ot 

These mathematical properties led to the statis- 
tical interpretation given by Max Born: in those 
experiments in which the position of the particles is 
measured, the integral of |w(x, t)|^ over a region Q of 
space gives the probability that at time ż the particle 
is localized in the region 2“. Moreover, the current 
associated with a charged particle is given locally by 
j(x, t) defined above. 

Let us now briefly review Heisenberg's approach. 
At the heart of this approach are: empirical formulas 
for the intensities of emission and absorption of 
radiation (dispersion relations), Sommerfeld's quan- 
tum condition for the action and the vague 
statement *the analogue of the derivative for the 
discrete action variable is the corresponding finite 
difference quotient." And, most important, the 
remark that the correct description of atomic 
physics was through quantities associated with 
pairs of states, that is, (infinite) matrices and the 
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empirical fact that the frequency (or rather the wave 
number) wg, ; of the radiation (emitted or absorbed) 
in the transition between the atomic levels k and 
j (k Æj) satisfies the Ritz combination principle 
Wm, j + Wi k=Wy,p- It easy to see that any doubly 
indexed family satisfying this relation must have the 
form wn , = E,, — E, for suitable constant Ej. 

It was empirically verified by Kramers that the 
dipole moment of an atom in an external monochro- 
matic external field with frequency v was proportional 
to the field with a coefficient (of polarization) 


e fi F; 
PNE PC C a a 
mei È -v v-r 5 


1 I 


where e, m are the charge and the mass of the 
electron and f;,F; are the probabilities that the 
frequency v is emitted or absorbed. 

A detailed analysis of the phenomenon of polarization 
in classical mechanics, with the clearly stated aim “of 
presenting the results in a way that may give hints for the 
construction of a New Mechanics" was made by Max 
Born (1924). He makes use of action-angle variables 
{ Ji, ĝi} assuming that the atom can be considered as a 
collection of harmonic oscillators with frequency v; 
coupled linearly to the electric field of frequency p. 

In the dipole approximation one obtains the 
following result for the polarization P (linear 
response in energy to the electric field): 


AMT Qv - m) 
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where v, = OH /O]f,, H is the interaction Hamiltonian), 
and A(J) is a suitable matrix. In order to derive the 
new dynamics, having as a guide the correspondence 
principle, one has to compare this result with the 
Kramers dispersion relation, which we write (to make 
the comparison easier) in the form 
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Bohr's rule implies that v(n 十 了 ,7) 一 (下 (1 十 T 一 
E(n))/b. 
Born and Heisenberg noticed that, for n suffi- 
ciently large and & small, one can approximate the 
differential operator in [6] with the corresponding 


difference operator, with an error of the order of k/n. 
Therefore, [6] could be substituted by 
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The conclusion Born and Heisenberg drew is that 
the matrix A that takes the place of the momentum 
in the classical theory must be such that 
Atm “一 ehm” f(n +m,n). In the same vein, 
considering the polarization in a static electric 
field, it is possible to find an expression for the 
matrix that takes the place of the coordinate x in 
classical Hamiltonian theory. 

In general, the new approach (matrix mechanics) 
associates matrices with some relevant classical 
observables (such as functions of position or 
momentum) with a time dependence that is derived 
from the empirical dispersion relations of Kramers, 
the correspondence principle, Bohr’s rule, Sommer- 
feld action principle and first- (and second-) order 
perturbation theory for the interaction of an atom 
with "an external electromagnetic field. It was soon 
clear to Born and Jordan (1925) that this dynamics 
took the form ibA = AH — HA for a matrix H that 
for the case of the hydrogen atom is obtained for the 
classical Hamiltonian with the prescription given for 
the coordinates x and p. It was also seen as plausible 
the relation [£5, p] =i] among the matrices x, and 
p, corresponding to position and momentum. One 
year later P Dirac (1926) pointed out the structural 
identity of this relation with the Poisson bracket of 
Hamiltonian dynamics, developed a *quantum alge- 
bra” and a “quantum differentiation" and proved 
that any “-derivation 6 (derivation which preserves 
the adjoint) of the algebra By of N x N matrices is 
inner, that is, is given by 6(a)=il[a,h] for a 
Hermitian matrix h. Much later this theorem was 
extended (with some assumptions) to the algebra of 
all bounded operators on a separable Hilbert space. 
Since the derivations are generators of a one- 
parameter continuous group of automorphisms, 
that is, of a dynamics, this result led further strength 
to the ideas of Born and Heisenberg. 

The algebraic structure introduced by Born, 
Jordan, and Heisenberg (1926) was used by Pauli 
(1927) to give a purely group-theoretical derivation 
of the spectrum of the hydrogen atom, following the 
lines of the derivation in symplectic mechanics of the 
SO(4) symmetry of the Coulomb system. This 
remarkable success gave much strength to the 
Heisenberg formulation of QM, which was soon 
recognized as an efficient instrument in the study of 
the atomic world. 

The algebraic formulation was also instrumental 
in the description given by Pauli (1928) of the 
“spin” (a property of electrons empirically postu- 
lated by Goudsmidt and Uhlenbeck to account for a 
hyperfine splitting of some emission lines) as 
“internal” degree of freedom without reference to 
spatial coordinates and still connected with the 


properties of the the system under the group of 
spatial rotations. This description through matrices 
has a major role also in the formulation by Pauli of 
the exclusion principle (and its relation with Fermi- 
Dirac statistics), which gave further credit to the 
Heisenberg's theory by helping in reproducing 
correctly the classification of the atoms. 

These features may explain why the “standard” 
formulation of the axioms of OM given in the next 
section shows the influence of  Heisenberg's 
approach. On the other hand, comparison with 
experiments is usually set in the framework in 
Schródinger's approach. Posing the problems in 
terms of properties of the solution of the Schródinger 
equation, one is led to a pragmatic use of the 
formalism, leaving aside difficulties of interpreta- 
tion. This separation of “the axioms" from the 
*practical use" may be one of the reasons why a 
serious analysis of the axioms and of the problems 
that arise from them is apparently not a concern for 
most of the research in QM, even from the point of 
view of mathematical physics. 

One should stress that both the approach of Born 
and Heisenberg and that of de Broglie and Schró- 
dinger are rooted in a mixture of attention to the 
experimental data, deep understanding of the pre- 
vious theory, bold analogies and approximations, 
and deep concern for the consistency of the “new 
mechanics." 

There is an essential difference between the 
starting points of the two approaches. In Heisen- 
berg's approach, the atom has a priori no spatial 
structure; the description is entirely in terms of its 
properties under emission and absorption of light, 
and therefore its observable quantities are repre- 
sented by matrices. Dynamics enters through the 
study of the interaction with the electromagnetic 
field, and some analogies with the classical theory of 
electrodynamics in an asymptotic regime (correspon- 
dence principle). In this way, as we have briefly 
indicated, the special role of some matrices, which 
have a mutual relation similar to the relation of 
position and momentum in Hamiltonian theory. 
Following this analogy, it is possible to extend the 
theory beyond its original scope and consider 
phenomena in which the electrons are not bound 
to an atom. 

In the approach of Schrödinger, on the other 
hand, particles and collections of particles are 
represented by spatial structures (waves). Spatial 
coordinates are therefore introduced a priori, and 
the position of a particle is related to the intensity of 
the corresponding wave (this was stressed by Born). 
Position and momentum are both basic measurable 
quantities as in classical mechanics. Physical 
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interpretation forces the particle wave to be square 
integrable, and mathematics provides a limitation on 
the simultaneous localization in momentum and 
position leading to Heisenberg’s uncertainty princi- 
ple. Dynamics is obtained from a particle-wave 
duality and an analogy with the relativistic wave 
equation in the low-energy regime. The presence of 
bound states with quantized energies is seen as a 
consequence of the well-known fact that waves 
confined to a bounded spatial region have their 
wave number (and therefore energy) quantized. 


Formal Structure 


In this section we describe the formal mathematical 
structure that is commonly associated with QM. It 
constitutes a coherent mathematical theory, but the 
interpretation axiom it contains leads to conceptual 
difficulties. 

We state the axioms in the form in which they 
were codified by J von Neumann (1966); they 
constitute a mathematically precise rendering of the 
formalism of Born, Heisenberg, and Jordan. The 
formalism of Schródinger per se does not require 
general statements about the category of 
observables. 


Axiom I 


(1) Observables are represented by self-adjoint opera- 
tors in a complex separable Hilbert space H. 
(ii) Every such operator represents an observable. 


Remark  Axiom I (ii) is introduced only for mathe- 
matical simplicity. There is no physical justification 
for part (ii). In principle, an observable must be 
connected to a procedure of measurement (observa- 
tion) and for most of the self-adjoint operators on H 
(e.g. in the Schrödinger representation for 
ix, (0/Oxyj )x,) such procedure has not yet been given). 


Axiom Il 


(i) Pure states of the systems are represented by 
normalized vectors in H. 

(ii) If a measurement of the observable A is made on 
a system in the state represented by the element 
@ € H, the average of the numerical values one 
obtains is < 内 Aó >, a real number because A is 
self-adjoint (we have denoted by <¢,w> the 
scalar product in H). 


Remark Notice that Axiom II makes no statement 
about the outcome of a single measurement. 


Using the natural complex structure of B(H), pure 
states can be extended as linear real functionals on 


B(H). 
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One defines a state as any linear real positive 
functional on B(H) (all bounded operators on the 
separable Hilbert space H) and says that a state is 
normal if it is continuous in the strong topology. 
It can be proved that a normal state can be 
decomposed into a convex combination of at most 
a denumerable set of pure states. With these 
definitions a state is pure iff it has no nontrivial 
decomposition. It is worth stressing that this state- 
ment is true only if the operators that correspond to 
observable quantities generate all of B(H); one refers 
to this condition by stating that there are no 
superselection rules. 

By general results in the theory of the algebra 
B(H), a normal state p is represented by a positive 
operator of trace class o through the formula 
p(A) — Tr(oA). Since a positive trace-class operator 
(usually referred to as density matrix in analogy 
with its classical counterpart) has eigenvalues A, 
that are positive and sum up to 1, the decomposition 
of the normal state p takes the form o= $`; Alg, 
where II, is the projection operator onto the kth 
eigenstate (counting multiplicity). 

It is also convenient to know that if a sequence of 
normal states c; on B(H) converges weakly (i.e., for 
each A € B(H) the sequence o,(A) converges) then 
the limit state is normal. This useful result is false in 
general for closed subalgebras of B(H), for example, 
for algebras that contain no minimal projections. 

Note that no pure state is dispersion free with 
respect to all the observables (contrary to what 
happens in classical mechanics). Recall that the 
dispersion. of the state pr with respect to the 
observable A is defined as A,(A) = e(A?) — (a(A))’. 

The connection of the state with the outcome of a 
single measurement of an observable associated with 
an operator A is given by the following axiom, which 
we shall formulate only for the case when the self- 
adjoint operator A has only discrete spectrum. The 
generalization to the other case is straightforward but 
requires the use of the spectral projections of A. 


Axiom III 


(i) If A has only discrete spectrum, the possible 
outcomes of a measurement of A are its 
eigenvalues {ax}. 

(ii) If the state of the system immediately before the 
measurement is represented by the vector à € H, 

the probability that the outcome be a, is $`, |< v 

oy >|, where is^ are a complete orthonormal 
set in the Hilbert space spanned by the eigenvec- 
tors of A to the eigenvalue a;. 

(iii) If a system is in the pure state ¢ and one 
performs a measurement of the observable 
A with outcome a; € (b—6,b+ 6) for some 


b,6 € R then immediately after the measure- 
ment the system can be in any (not necessarily 
pure) state which lies in the convex hull of the 
pure states which are in the spectral subspace of 


the operator A in the interval Ap 三 
(b — 6,b 4- 6). 


Note Statements (ii) and (iii) can be extended 
without modification to the case in which the initial 
state is not a pure state, and is represented by a 
density matrix oc. 


Remark 1 Axiom III makes sure that if one 
performs, immediately after the first, a further 
measurement of the same observable A the outcome 
will still lie in the interval A,.;. This is needed to 
give some objectivity to the statement made about 
the outcome; notice that one must place the 
condition “immediately after" because the evolution 
may not leave invariant the spectral subspaces of A. 
If the operator A has, in the interval Aj,;, only 
discrete (pure point) spectrum, one can express 
Axiom III in the following way: the outcome can 
be any state that can be represented by a convex 
affine superposition of the eigenstates of A with 
eigenvalues contained in Ag. ;. 


In the very special case when A has only one 
eigenvalue in Aps and this eigenvalue is not 
degenerate, one can state Axiom III in the following 
form (commonly referred to as “reduction of the 
wave packet"): the system after the measurement is 
pure and is represented by an eigenstate of the 
operator A. 


Remark 2 Notice that the third axiom makes a 
statement about the state of the system after the 
measurement is completed. 


It follows from Axiom III that one can measure 
*simultaneously" only observables which are repre- 
sented by self-adjoint operators that commute with 
each other (i.e., their spectral projections mutually 
commute). It follows from the spectral representa- 
tion of the self-adjoint operators that a family (A,] 
of commuting operators can be considered (i.e., 
there is a representation in which they are) functions 
over a common measure space. 

Axioms I-III give a mathematically consistent 
formulation of QM and allow a statistical descrip- 
tion (and statistical prediction) of the outcome of 
the measurement of any observable. It is worth 
remarking that while the predictions will have only 
a statistical nature, the dynamical evolution of the 
observables (and by duality of the states) will be 
described by deterministic laws. The intrinsically 
statistical aspect of the predictions comes only from 


the third postulate, which connects the mathemati- 
cal content of the theory with the measurement 
process. 

The third axiom, while crucial for the connection 
of the mathematical formalism with the experimen- 
tal data, contains the seed of the conceptual 
difficulties which plague QM and have not been 
cured so far. 

Indeed, the third axiom indicates that the process 
of measurement is described by laws that are 
intrinsically different from the laws that rule the 
evolution without measurement. This privileged role 
of the changing by effect of a measurement leads to 
serious conceptual difficulties since the changing is 
independent of whether or not the result is recorded 
by some observer; one should therefore have a way 
to distinguish between measurements and generic 
interactions with the environment. 

A related problem that is originated by Axiom III 
is that the formulation of this axiom refers implicitly 
to the presence of a classical observer that certifies 
the outcomes of measurements and is allowed to 
make use of classical probability theory. This 
observer is not subjected therefore to the laws 
of QM. 

These two aspects of the conceptual difficulties 
have their common origin in the separation of the 
measuring device and of the measured systems into 
disjoint entities satisfying different laws. The diffi- 
culties in the theory of measurement have not yet 
received a satisfactory answer, but various attempts 
have been made, with various degree of success, and 
some of them are described briefly in the section 
“Interpretation problems." It appears therefore that 
QM in its present formulation is a refined and 
successful instrument for the description of the 
nonrelativistic phenomena at the Planck scale, but 
its internal consistency is still standing on shaky 
ground. 

Returning to the axioms, it is worth remarking 
explicitly that according to Axiom II a state is a 
linear functional over the observables, but it is 
represented by a sesquilinear function on the 
complex Hilbert space H. Since Axiom II states 
that any normalized element of H represents a state 
(and elements that differ only by a phase represent 
the same state) together with ó,» also €= aġ + 
bw, |a}? + |b\> =1 represent a state superposition of 
@ and w (superposition principle). 

But for an observable A, one has in general 
pe (A) x lal pol A) + Ib, p, (A), due to the cross-terms 
in the scalar product. The superposition principle is 
one of the characteristic. features of QM. The 
superposition of the two pure states @ and wv has 
properties completely different from those of a 
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statistical mixture of the same two states, defined 
by the density matrix c = Ja| IL; + |b| Ily, where we 
have denoted by II, the orthogonal projection onto 
the normalized vector œ. Therefore, the search for 
these interference terms is one of the means to verify 
the predictions of QM, and their smallness under 
given conditions is a sign of quasiclassical behavior 
of the system under study. 

Strictly connected to superposition are entangle- 
ment and the partial trace operation. Suppose that 
one has two systems which when considered 
separately are described by vectors in two Hilbert 
spaces H; i= 1,2, and which have observables A; € 
B(H;). When we want to study their mutual 
interaction, it is natural to describe both of them in 
the Hilbert space Hı $ Hı and to consider the 
observables A, © I and I & Ad. 

When the systems interact, the interaction will not 
in general commute with the projection operator Il, 
onto Hı. Therefore, even if the initial state is of the 
form $4 65,0; € Hi, the final state (after the 
interaction) is a vector € € Hı ® Hı which cannot 
be written as £—61 QG with C; € Hi. It can be 
shown, however, that there always exist two 
orthonormal family vectors ó, € Hı and v, € H2 
such that €= X cnn & v, for suitable c, € C, 
Y^|e,-1 (this decomposition is not unique in 
general). 

Recalling that p,«,,(A1 Q I) — p( A1), one can write 


p(Ai & I) = >》 esl pa (A1) = ps (A1) 


c= x lcs [^ TT. 
n 


The map LT2:pc 一 Do is called reduction or also 
conditioning) with respect to H2; it is also called 
“partial trace" with respect to H2. The first notation 
reflects the analogy with conditioning in classical 
probability theory. 

The map T> can be extended by linearity to a map 
from normal states (density matrices) on B(H, & H2) 
to normal states on B(?1;) and gives rise to a 
positivity-preserving and trace-preserving map. 

One can in fact prove (Takesaki 1971) that any 
conditioning for normal states of a von Neumann 
algebra M is completely positive in the sense that it 
remains positive after tensorization of M with B(K), 
where XK is an arbitrary Hilbert space. 

It can also be proved that a partial converse is 
true, that is, that every completely positive trace- 
preserving map 中 on normal states of a von 
Neumann algebra A C B(H) can be written, for a 
suitable choice of a larger Hilbert space K and 
partial isometries V, in the form (Kraus form) 
$(a) = > Via Va. 
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But it must be remarked that, if U(t) is a one- 
parameter group of unitary operators on H1 & H2 
and c is a density matrix, the one-parameter family 
of maps I(t)=o0—-T2(U(t)oU*(t)) does not, in 
general, have the semigroup property [(f+s)= 
l(t) -l(s) s,t > 0 and therefore there is in general 
no generator (of a reduced dynamics) associated 
with it. Only in special cases and under very strong 
hypothesis and approximations is there a reduced 
dynamics given by a semigroup (Markov property). 

Since entanglement and (nontrivial) conditioning are 
marks of QM, and on the other side the Markov 
property described\above is typical of conditioning in 
classical mechanics,.it is natural to search for condi- 
tions and approximations under which the Markov 
property is recovered, and more generally under which 
the coherence properties characteristic of QM are 
suppressed (decoherence). We shall discuss briefly this 
problem in the section "Interpretation problems," 
devoted to the attempts to overcome the serious 
conceptual difficulties that descend from Axiom III. 

It is seen from the remarks and definitions above 
that normal states (density matrices) play the role 
that in classical mechanics is attributed to measures 
over phase space, with the exception that pure states 
in QM do not correspond to Dirac measures (later 
on we shall discuss the possibility of describing a 
quantum-mechanical states with a function (Wigner 
function) on phase space). 

In this correspondence, evaluation of an observa- 
ble (a measurable function over phase space) over a 
state (a normalized, positive measure) is related to 
finding the (Hilbert space) trace of the product of an 
operator in B(H) with a density matrix. Notice that 
the trace operation shares some of the properties of 
the integral, in particular tr AB—tr BA if A is in 
trace class and B € B(H) (cf. ge L! and f € L*) 
and tr AB > 0 if A is a density matrix and B is a 
positive operator. This suggests to define functions 
over the density matrices that correspond to quan- 
tities which are important in the theory of dynami- 
cal systems, in particular the entropy. 

This is readily done if the Hilbert space is finite 
dimensional, and in the infinite-dimensional case if 
one takes as observables all Hermitian bounded 
operators. [n quantum statistical mechanics one is 
led to consider an infinite collection of subsystems, 
each one described with a Hilbert space (finite or 
infinite dimensional) H;,i=1,2,..., the space of 
representation is a subspace K of Tt &$ H5 &--., 
and the observables are a (weakly closed) subalgebra 
A of B(K) (typically constructed as an inductive 
limit of elements of the form 1$ 1---9A,GlI---). 
In this context one also considers normal states on .4 
and defines a trace operation, with the properties 


described above for a trace. Most of the definitions 
(e.g., of entropy) can be given in this enlarged 
context, but differences may occur, since in general 
A does not contain finite-dimensional projections, 
and therefore the trace function is not the trace 
commonly defined in a Hilbert space. We shall not 
describe further this very interesting and much 
developed theory, of major relevance in quantum 
statistical mechanics. For a thorough presentation 
see Ohya and Petz (1993). 

The simplest and most-studied example is the 
case when each Hilbert space H; is a complex 
two-dimensional space. The resulting system is 
constructed in analogy with the Ising model of 
classical statistical mechanics, but in contrast to that 
system it possesses, for each value of the index å, 
infinitely many pure states. The corresponding 
algebra of observables is a closed subalgebra of 
(C? x C?)®% and generically does not contain any 
finite-dimensional projection. 

This model, restricted to the case (C? x C2) K a 
finite integer, has become popular in the study of 
quantum information and quantum computation, in 
which case a normalized element of H; is called a q-bit 
(in analogy with the bits of information in classical 
information theory). It is clear that the unit sphere in 
(C? x C?) contains many more than four points, and 
this gives much more freedom for operations on the 
system. This is the basis of quantum computation and 
quantum information, a very interesting field which 
has received much attention in recent years. 


Quantization and Dynamics 


The evolution in nonrelativistic OM is described by 
the Schródinger equation in the representation in 
which for an N-particle system the Hilbert space is 
L*(R?N & C*, where CR is a finite-dimensional space 
which accounts for the fact that some of the 
particles may have a spin content. 

Apart from (often) inessential parameters, the 
Schrödinger equation for spin-0 particles can be 
written typically as 

., Ob 
ib aa Ho 
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H = 》 mx(ihVe + Ar) 
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where 5 is Planck's constant, A, are vector-valued 
functions (vector potentials), and V, and Vj, are 
scalar-valued function (scalar potentials) on R?. 


If some particles have of spin 1/2, the correspond- 
ing kinetic energy term should read — (ibo - V}, 
where op, k — 1, 2, 3, are the Pauli matrices and one 
must add a term W(x) which is a matrix field with 
values in C^ & C^ and takes into account the 
coupling between the spin degrees of freedom. 
Notice that the local operator ic - V is a “square 
root" of the Laplacian. 

A relativistic extension of the Schródinger equa- 
tion for a free particle of mass m > 0 in dimension 
3 was obtained by Dirac in a space of spinor- 
valued functions (x,t), k — 0, 1,2, 3, which carries 
an irreducible representation of the Lorentz group. 
In analogy with the electromagnetic field, for which 
a linear partial differential equation (PDE) can be 
written using a four-dimensional representation of 
the Lorentz group, the relativistic Dirac equation is 
the linear PDE 


3 
iS^ ð | 
4 ee xo = ct 


where the ^ generate the algebra of a representation 
of the Lorentz group. The operator 》 (0/Ox,)»y, is a 
local square root of the relativistically invariant 
d'Alembert operator —0?/Ox2 + A — m- I. 

When one tries to introduce (relativistically 
invariant) local interactions, one faces the same 
problem as in the classical mechanics, namely one 
must introduce relativistically covariant fields (e.g., 
the electromagnetic field), that is, systems with an 
infinite number of degrees of freedom. If this field is 
considered as external, one faces technical problems, 
which can be overcome in favorable cases. But if one 
tries to obtain a fully quantized theory (by also 
quantizing the field) the obstacles become unsur- 
mountable, due also to the nonuniqueness of the 
representation of the canonical commutation rela- 
tions if these are taken as the basis of quantization, 
as in the finite-dimensional case. 

In a favorable case (e.g., the interaction of a 
quantum particle with the quantized electromagnetic 
field) one can set up a perturbation scheme in a 
parameter a (the physical value of o in natural units 
is roughly 1/137). We shall come back later to 
perturbation schemes in the context of the Schro- 
dinger operator; in the present case one has been 
able to find procedures (renormalization) by which 
the series in œ that describe relevant physical 
quantities are well defined term by term. But even 
in this favorable case, where the sum of the first few 
terms of the series is in excellent agreement with the 
experimental data, one has reasons to believe that 
the series is not convergent, and one does not even 
know whether the series is asymptotic. 
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One is led to wonder whether the structure of 
fields (operator-valued elements in the dual of 
compactly supported smooth functions on classical 
spacetime), taken over in a simple way from the 
field structure of classical electromagnetism, is a 
valid instrument in the description of phenomena 
that take place at a scale incomparably smaller than 
the scale (atomic scale) at which we have reasons to 
believe that the formalisms of Schródinger and 
Heisenberg provide a suitable model for the descrip- 
tion- of natural phenomena. 

The phenomena which are related to the interac- 
tion of a quantum nonrelativistic particle interacting 
with the quantized electromagnetic field take place 
at the atomic scale. These phenomena have been the 
subject of very intense research in theoretical 
physics, mostly within perturbation theory, and the 
analysis to the first few orders has led to very 
spectacular results (although there is at present no 
proof that the perturbation series are at least 
asymptotic). 

In this field rigorous results are scarce, but 
recently some progress has been made, establishing, 
among other things, the existence of the ground 
state (a nontrivial result, because there is no gap 
separating the ground-state energy from the con- 
tinuous part of the spectrum) and paving the way 
for the description of scattering phenomena; the 
latter result is again nontrivial because the photon 
field may lead to an anomalous infrared (long- 
range) behavior, much in the same way that the 
long-range Coulomb interaction requires a special 
treatment in nonrelativistic scattering theory. 

This contribution to the Encyclopedia is meant to 
be an introduction to QM and therefore we shall 
limit ourselves to the basic structure of nonrelativis- 
tic theory, which deals with systems of a finite 
number of particles interacting among themselves 
and with external (classical) potential fields, leaving 
for more specialized contributions a discussion of 
more advanced items in QM and of the successes 
and failures of a relativistically invariant theory of 
interaction between quantum particles and quan- 
tized fields. 

We shall return therefore to basics. 

One may begin a section on dynamics in QM by 
discussing some properties of the solutions of the 
Schródinger equation, in particular dispersive effects 
and the related scattering theory, the problem of 
bound states and resonances, the case of time- 
dependent perturbation and the ionization effect, 
the binding of atoms and molecules, the Rayleigh 
scattering, the Hall effect and other effects in 
nanophysics, the various multiscale and adiabatic 
limits, and in general all the physical problems that 
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have been successfully solved by Schródinger's QM 
(as well as the very many interesting and unsolved 
problems). 

We will consider briefly these issues and the 
approximation schemes that have been developed in 
order to derive explicit estimates for quantities of 
physical interest. Since there are very many excellent 
reviews of present-day research in QM (e.g., Araki 
and Ezawa (2004), Blanchard and Dell'Antonio 
(2004), Cycon et al. (1986), Islop and Sigal (1996), 
Lieb (1990), Le Bris (2005), Simon (2002), and 
Schlag (2004)) we refer the reader to the more 
specialized contributions to this Encyclopedia for a 
detailed analysis and precise statements about the 
results. 

We prefer to come back first to the foundations of 
the theory; we shall take the point of view of 
Heisenberg and start discussing the mapping proper- 
ties of the algebra of observables and of the states. 
Since transition probabilities play an important role, 
we consider only transformations o which are such 
that, for any pair of pure states @; and @2, one has 
<a(1),a(¢2) > = «61,0» ». We call these maps 
Wigner automorphisms. 

A result of Wigner (see Weyl (1931)) states that if 
a is a Wigner automorphism then there exists a 
unique operator U,, either unitary or antiunitary, 
such that o(P) — U? PU, for all projection operators. 
If there is a one-parameter group of such auto- 
morphisms, the corresponding operators are all 
unitary (but they need not form a group). 

A generalization of this result is due to Kadison. 
Denoting by Iı, the set of density matrices, a 
Kadison automorphism 5 is, by definition, such that 
for all 01,02 € h, and all 0 <s <1 one has 9(se + 
(1— s)o3) ^ sB(o1) + (1 — s)9(o5). For Kadison auto- 
morphisms the same result holds as for Wigner's. 

A similar result holds for automorphisms of the 
observables. Notice that the product of two Hermi- 
tian operators is not Hermitian in general, but 
Hermiticity is preserved under Jordan's product 
defined as A x B = (1/2)[AB + BA]. 

A Segal automorphism is, by definition, an 
automorphism of the Hermitian operators that 
preserves the Jordan product structure. A theorem 
of Segal states that y is a Segal automorphism if and 
only if there exist an orthogonal projector E, a 
unitary operator U in EH, and an antiunitary 
operator V in (I — EJH such that y(A)= W AW*, 
where W zc U 6$ V. 

We can study now in more detail the description 
of the dynamics in terms of automorphism of 
Wigner or Kadison type when it refers to states 
and of Segal type when it refers to observables. We 
require that the evolution be continuous in suitable 


topologies. The strongest result refers to Wigner's 
case. One can prove that if a one-parameter group 
of Wigner automorphism a; is measurable in the 
weak topology (i.e., a,o(A) is measurable in £ for 
every choice of A and a) then it is possible to choose 
the U(t) provided by Wigner's theorem in such a 
way that they form a group which is continuous in 
the strong topology. Similar results are obtained for 
the cases of Kadison and Segal automorphism, but 
in both cases one has to assume continuity of o; in a 
stronger topology (the strong operator topology in 
the Segal case, the norm topology in Kadison's). 
Weak continuity is sufficient if the operator product 
is preserved (in this case one speaks of automorph- 
isms of the algebra of bounded operators). The 
existence of the continuous group U(t) defines a 
Hamiltonian evolution. One has indeed: 


Theorem 1 (Stone). The map t— U(t,t € R is a 
weakly continuous representation of R in tbe set of 
unitary operators in a Hilbert space H if and only if 
there exists a self-adjoint operator H on (a dense set 
of) H such that U(t) — e"" and therefore 


dU(t) 
dt 


The operator H is called generator of the dynamics 
described by U(t). 


$ € D(H) >i $= HU(t)ó [10] 


Note In Schrödinger’s approach the operator 
described in Stone’s theorem is called Hamiltonian, 
in analogy with the classical case. In the case of one 
particle of mass m in R? subject to a conservative 
force with potential energy V(x) it has the following 
form, in units in which 5-— 1: 


1 a 
H=- A+ Vs), A= Dax? [11] 
If the potential V depends on time, Stone's theorem 
is not directly applicable but still the spectral 
properties of the self-adjoint operators H, and of 
the Kernel of the group 7— e^" are essential to 
solve the (time-dependent) Schrödinger equation. 

The semigroup t— e^" is usually a positivity- 
preserving semigroup of contractions and defines a 
Markov process; in favorable cases, the same is true 
of t — e^! (Feynmann-Kac formula). 

There is an analogous situation in the general 
theory of dynamical systems on a von Neumann 
algebra; in analogy with the case of elliptic 
operators, one defines as "dissipation" a map A on 
a von Neumann algebra M which satisfies A(a*a) > 
a* A(a) + A(a*)a for all a € M. The positive dissipa- 
tion A is called completely positive if it remains 
positive after tensorization with B(K) for any 


Hilbert space X. Notice that according to this 
definition every *-derivation is a completely positive 
dissipation. For dissipations there is an analog of the 
theorem of Stinespring, and often bounded dissipa- 
tion can be written as 


A(a) = ilh, a| 十 3 ViaV, 一 (5) 2 (Vi Vi, a} 
for a € M 


(the symbols {.,.} denote the anticommutator). 

In general terms, by quantization is meant the 
construction of a theory by deforming a commutative 
algebra of functions on a classical phase X in such a 
way that the dynamics of the quantum system can be 
derived from the prescription of deformation, usually 
by deforming the Poisson brackets if X is a cotangent 
bundle T*M (Halbut 2002, Landsman 2002). We 
shall discuss only the Weyl quantization (Weyl 1931) 
that has its roots in Heisenberg's formulation of QM 
and refers to the case in which the configuration space 
is RN, or, with some variant (Floquet-Zak) the 
N-dimensional torus. We shall add a few remarks 
on the Wick (anti-Weyl) quantization. More general 
formulations are needed when one tries to quantize a 
classical system defined on the cotangent bundle of 
a generic variety and even more so if it defined on a 
generic symplectic manifold. 

The Weyl quantization is a mathematically accu- 
rate rendering of the essential content of the 
procedure adopted by Born and Heisenberg to 
construct dynamics by finding operators which 
play the role of symplectic coordinates. 

Consider a system with one degree of freedom. 
The first naive attempt would be to find operators 
q, p that satisfy the relation 


[a.p] c il [12] 


and to construct the Hamiltonian in analogy with 
the classical case. To play a similar role, the 
operators q and p must be self-adjoint and satisfy 
[12] at least in a weak sense. If both are bounded, 
[12] implies e ^?ge- ^? = g + bI (the exponential is 
defined through a convergent series) and therefore 
the spectrum of q is the entire real line, a contra- 
diction. Therefore, that inclusion sign in |12] is strict 
and we face domain problems, and as a consequence 
[12] has many inequivalent solutions (“equivalence” 
here means “unitary equivalence"). 

Apart from “pathological” ones, defined on 
L?-spaces over multiple coverings of R, there are 
inequivalent solutions of [12] which are effectively 
used in QM. 

The most common solution is on the Hilbert space 
L^(R) (with Lebesgue measure), with X defined as 
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the essentially self-adjoint operator that acts on the 
smooth functions with compact support as multi- 
plication by the coordinate x and p is defined 
similarly in Fourier space. This representation can 
be trivially generalized to construct operators ĝ and 
py in LAR"): 

Another frequently used representation of [12] is 
on L^(S!) (and when generalized to N degrees of 
freedom, on T"). In this representation, the operator 
p is defined by c,— kc, on functions f(0)— 
SA acte 9?77,0 > M,N < oc. In this case the 
operator q is defined as multiplication by the angle 
coordinate 0. It is easy to check that this representa- 
tion is inequivalent to the previous one and that [12] 
is satisfled (as an identity) on the (dense) set of 
vectors which are in the domain both of pq and 
of gp. But notice that the domain of essential self- 
adjointness of p is not left invariant by the action of 
q (0f (0) is a function on S! only if f(27) — 0). 

We shall denote p in this representation by the 
symbol 3/005, and refer to it as the Bloch 
representation. It can be modified by setting the 
action of f as c,— nc, +a,0 « a « 27, and this 
gives rise to the various Bloch-Zak and magnetic 
representations. 

The Bloch representation can be extended to 
periodic functions on R! noticing that L^(R)— 
L^(S') @ P(N); similarly, the Bloch-Zak and the 
magnetic representation can be extended to L?(RN). 

The difference between the representations can be 
seen more clearly if one considers the one-parameter 
groups of unitary operators generated by the 
“canonical operators” g and p. In the Schrédinger 
representation on L*(R), these groups satisfy 


U(a) V(b) — e^" V(b) U(a) 
U(a)=e'7, ^ V(b) = e*t 


and therefore, setting z=a+ib and W(z)= 
e 4^/? V(b)JU(a) one has 


W(z)W(z) = e) Wz + 2?) 113] 

z€C, w(zz)-lIm(z,z) 
The unitary operators W(z) are therefore projective 
representations of the additive group C. This 
generalizes immediately to the case of N degrees 
of freedom; the representation is now of the 
additive group CN and w is the standard symplectic 
form on CN, 

In the Bloch representation, the  unitaries 
U(a)V(b)U*(a)V*(b) are not multiples of the iden- 
tity, and have no particularly simple form. The map 
CN 5 z ^ W(z) with the structure [13] is called Weyl 
system; it plays a major role in QM. The following 


120 Introductory Article: Quantum Mechanics 


theorem has therefore a major importance in the 
mathematical theory of OM. 


Theorem 2 (von Neumann 1965). There exists 
only one, modulo unitary equivalence, irreducible 
representation of tbe Weil system. 


The proof of this theorem follows a general 
pattern in the theory of group representations. One 
introduces an algebra W) of operators 


W= J f(z)W(z)dz, f € LWC) 


called Weyl algebras, 

It easy to see that iw, — |f|, and that f — Wr isa 
linear isomorphism of algebras if one considers WA’ 
with its natural product structure and L! as a 
noncommutative algebra with product structure 


f «gs | defle Dai) exp ee 2) [14] 


So far the algebra W? is a concrete algebra of 
bounded operators on L^(R?). But it can also be 
considered an abstract C'-algebra which we still 
denote by W'%). 

It is easy to see that, according to [14], if fo is 
chosen to be a suitable Gaussian, then W; is a 
projection operator which commutes with all the 
W;’s. Moreover, W;W, =; ,Wy,, for a suitable 
phase factor ó. Considering the Gelfand-Neumark- 
Segal construction for the C'-algebra W°, one 
finds that these properties lead to a decomposition 
of any representation in cyclic irreducible equivalent 
ones, completing the proof of the theorem. 

The Weyl system has a representation (equivalent 
to the Schrödinger one) in the space L^(R',g), 
where g is Gauss's measure. This allows an exten- 
sion in which CN is replaced by an infinite- 
dimensional Banach space equipped with a Gauss 
measure (weak distribution (Segal 1965, Gross 
1972, Wiener 1938)). Uniqueness fails in this more 
general setting (uniqueness is strictly connected with 
the compactness of the unit ball in C"). Notice that 
in the Schródinger representation (and, therefore, in 
any other representation) the Hamiltonian for the 
harmonic oscillator defines a positive self-adjoint 
operator 


> : 2 
= , a — Rn za] 


The spectrum of each of the commuting operators 
N; consists of the positive integers (including 0) and 
is therefore called number operator for the Ath 
degree of freedom. The operator N, can be written 
as Ng =ajak, where a, = (1/V/2) (xp + 0/Ox,) and aj, 


is the formal adjoint of a, in L^(R). One has 
la,(Ng +1) '7^|«1. In the domain of N these 
operators satisfy the following relations (canonical 
commutation relations) 


(an, a1] = Opp; lap, a4] = 0 


15 
[NA ap] = —apépk; 


IN), aj] = aia 
In view of the last two relations, the operator a, is 
called the annihilation operator (relative to the kth 
degree of freedom) and its formal adjoint is called 
the creation operator. The operators a, have as 
spectrum the entire complex plane, the operators a; 
have empty spectrum; the eigenvectors of N, are the 
Hermite polynomials in the variable x,. The 
eigenvectors of a, (i.e., the solutions in L^(R) of 
the equation apoy = Ady, A € C) are called coherent 
states; they have a major role in the Bargmann- 
Fock-Segal quantization and in general in the 
semiclassical limit. 

The operators (N,] generate a maximal abelian 
system and therefore the space L7(R) has a natural 
representation as the symmetrized subspace of 
&,(CN)* (Fock representation). In this representa- 
tion, a natural basis is given by the common 
eigenvectors imj k = 1,..., N, of the operators N}. 
A generic vector can be written as 


; 2 
V =Y ,ctm) imt 3C <00 


{me} ln) 


and therefore can be represented by the sequence cin,). 

Notice that the creation operators do not create 
particles in RN but rather act as a shift in the basis 
of the Hermite polynomials. 

It is traditional to denote by ^4(L^(RN)) the Fock 
representation (also called second quantization 
because for each degree of freedom the wave 
function is written in the quantized basis of the 
harmonic oscillator) and to denote by T'(A) the lift 
of a matrix A € B(CN). These notations are espe- 
cially used if CN is substituted with a Banach space 
X. This terminology was introduced by Segal in his 
work on quantization of the wave equation; it is 
used ever since, mostly in a perturbative context. 

In the theory of quantized fields, the space CN is 
substituted with a Banach space, X, of functions. 
In this setting, “second quantization” (Segal 1965, 
Nelson 1974) considers the state @;,,) as represent- 
ing a configuration of the system in which there are 
precisely m, particles in the kth physical state (this 
presupposes having chosen a basis in the space of 
distribution on R?). There is no problem in doing 
this (Gross 1972) and one can choose for X a 
suitable Sobolev space (which one depends on the 
Gaussian measure given in X) if one wants that the 


—— S a lT BC Commym———— . e -—— aa dE a ns - 


generalization of the commutation relations [15] be 
of the form [a'(f),a(g)] — «f,g» with a suitable 
scalar product <-,-> in X. The problem with 
quantization of relativistic fields is that, in order to 
ensure locality, one is forced to use a Sobolev space 
of negative index (depending on the dimension of 
physical space), and this gives rise to difficulties in 
the definition of the dynamics for nonlinear vector 
fields. 

One should notice that in the work of Segal 
(1965), and then in Constructive field theory 
(Nelson 1974), the Fock representation is placed in 
a Schródinger context exhibiting the relevant opera- 
tors as acting on a space L^(X,g), where X is a 
subspace of the space of Schwartz distributions on 
the physical space of the particles one wants to 
describe and g is a suitably defined Gauss measure 
on X. 

The Fock representation is related to the Bargmann- 
Fock-Segal representation (Bargmann 1967), a repre- 
sentation in a space of holomorhic functions on CF 
square integrable with respect to a Gaussian measure. 
For its development, this representation relies on the 
properties of Toeplitz operators and on Tauberian 
estimates. It is much used in the study of the 
semiclassical limit and in the formulation of QM in 
systems for which the classical version has, for phase 
space, a manifold which is not a cotangent bundle 
(e.g., the 2-sphere). 


Remark The Fock representation associated with 
the Weyl system in the infinite-dimensional context 
can describe only particles obeying Bose-Einstein 
statistics; indeed, the states are qualified by their 
particle content for each element of the basis chosen 
and there is no possibility of identifying each 
particle in an N-particle state. This is obvious in 
the finite-dimensional case: the Hermite polynomial 
of order 2 cannot be seen as “composed” of two 
polynomials of order 1. 


In the infinite-dimensional-context, if one wants 
to treat particles which obey Fermi-Dirac statistics, 
one must rely on the Pauli exclusion principle (Pauli 
1928), which states that two such particles cannot 
be in the same configuration; to ensure this, the 
wave function must be antisymmetric under permu- 
tation of the particle symbols. It is a matter of fact 
(and a theorem in relativistic quantum field theory 
which follows in that theory from covariance, 
locality and positivity of the energy (Streater and 
Wightman 1964) that particles with half-integer spin 
obey the Fermi-Dirac statistics. Therefore, to quan- 
tize such systems, one must introduce (commuta- 
tion) relations different from those of Weyl. Since it 
must now be that (a*)* — 0, due to antisymmetry, it 
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is reasonable to introduce the following relations 
(canonical anticommutation relations: 


{ap ak} =0 
{A,B} = AB — BA 


{ak, aj] = Opp: 


16 
[Nk, 4p] = —ap őr k, 6 


The Hilbert space is now @NH2, where H2 is a 
two-dimensional complex Hilbert space. Notice that 
Həz carries an irreducible two-dimensional represen- 
tation of sU(2) = o(3) (spin representation) so that 
this quantization associates spin 1/2 and 
antisymmetry. 

The operators in [16] are all bounded (in fact 
bounded by 1 in norm). The Fock representation is 
constructed as in the case of Weyl (see Araki 
(1988)), with z, equal O or 1 for each index k. 
The infinite-dimensional case is defined in the same 
way, and leads to inequivalent irreducible represen- 
tations (Araki 1988); only in one of them is the 
number operator defined and bounded below. Some 
of these representations can be given a Schrödinger- 
like form, with the introduction of a gauge and an 
integration formalism based on a trace (Gross 
1972). This system is much used in quantum 
statistical mechanics because it deals with bounded 
operators and can take advantage of strong results 
in the theory of C*-algebras. In the finite-dimensional 
case (and occasionally also in the general case) it is 
used in quantum information (the space H2 is the 
space of a quantum bit). 

Returning to the Weyl system, we now introduce 
the strictly related Wigner function which plays an 
important role in the analysis of the semiclassical 
limit and in the discussion of some scaling limits, in 
particular the hydrodynamical limit and the Bose- 
Einstein condensation when N — ov. 

The Wigner function W, for a pure state ó is a 
real-valued function on the phase space of the 
classical system which represents the state faithfully. 
It is defined as 


Wulx,£)z Quy" |. e XE x 1 4 v(x 一 4 dy 


The Wigner function is not positive in general (the 
only exceptions are those Gaussian states that satisfy 
A(x) - A(p) > b). But is has the interesting property 
that its marginals reproduce correctly the Born rule. 
In fact, one has f W(x, €) dx = ló(£)^. If the func- 
tion ó(t, x) x € R” isa solution of the free Schrödinger 
equation ib0¢/dt=—h A then its Wigner function 
satisfies the Liouville (transport) equation OW, /t+ 
£- VW —0. 

The Wigner function is strictly linked with the 
Weyl quantization. This quantization associates 
with every function o(p,x) in a given regularity 
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class an operator o(D,x) (the Weyl symbol of the 
function c) defined by 


(o(D,x)f,g) = i c(£,x) W(f. g(£,x) d£ dx 


WE, gex) = f ePf(x + Fx - P) ap 
It can be verified that the action of F preserves the 
Schwartz classes $ and S$’ and is unitary in L^(R?^). 
Moreover, one has o(D,x)* =a(D,x). 

The relation between Weyl’s quantization and 
Wigner functions can be readily seen from the 
natural duality between bounded operators and 
pure states: 和 


tr(Àp) = J alp, q)p(b, q) dp dq 
p(p.q) = / eU4) o(q',q) dq’ 


We give now a brief discussion of the general 
structure of a quantization, and apply it to the 
Weyl quantization. By quantization of a Hamilto- 
nian system we mean a correspondence, parame- 
trized by a small parameter h, between classical 
observables (real functions on a phase space F) and 
quantum observables (self-adjoint operators on a 
Hilbert space H) with the property that the 
corresponding structures coincide in the limit 5 — 0 
and the difference for bz 0 can be estimated in a 
suitable topology. 

This last requirement is important for the applica- 
tions and, from this point of view, Weyl's quantiza- 
tion gives stronger results than the other formalisms 
of quantization. 

We limit our analysis to the case F = T* X, with 
X = RN, and we make use of the realization of H as 
LAR"). 

Let {x;} be Cartesian coordinates in R^ and 
consider a correspondence A — A that satisfies the 
following requirements: 


1. AA is linear; 

2. xy,  &, where £, is multiplication by xj; . 

3. Deo —150/0x,; 

4. if f is a continuous function in RN, one has 
f(x) f(x) and f(p) — (Ff)(X), where F denotes a 
Fourier transform; 

5. Leo Lz, = (a, b) ab € RN, where Le is the 
generator of the translations in phase space in 
the direction ¢ and Le is the generator of the one- 
parameter group t— W(t) associated with Ç by 
the Weyl system. 


Note that (1) and (4) imply (2) and (3) through a 
limit procedure. 


Under the correspondence A — A, linear symplec- 
tic maps correspond to unitary transformations. 
This is not in general the case for nonlinear maps. 

One can prove that conditions (1)-(5) give 
a complete characterization of the map AA. 
Moreover, the correspondence cannot be extended 
to other functions in phase space. Indeed, one has: 


Theorem 3 (van Hove). Let G be the class of 
functions C* on RN which are generators of global 
symplectic flows. For g€ G let ©,(t) be the 
corresponding group. There cannot exist for every 
g a correspondence gg, with & self-adjoint, such 
that 2(x, p) = g(&, p). 


We described the. Weyl quantization as a corre- 
spondence between functions in the Schwartz class $ 
and a class of bounded operators. Weyl's quantiza- 
tion can be extended to a much wider class of 
functions. Operators that can be so constructed are 
called Fourier integral operators. One uses the 
notation 6 = co(D, x). 

We have the following useful theorems (Robert 
1987): 


Theorem 4 Let h,...,lg be linear functions on RN 
such that {ll,}=0. Let P be a polynomial and let 
a(x) = P[lh(£, x), Ik(&, x)]. Then 


(i) e(D, x) maps S in L^(RN) and self-adjoint; 
(ii) if g is continuous, then (g(c)(D,x) — g(c(D, x)). 


One proves that o(D,x) extends to a continuous 
map S'(X) — S'(X) and, moreover, 


Theorem 5  (Calderon-Vaillancourt). If oo = 
allal<2NH ID? DZo| < oo the norm of the opera- 
tor c(D,x) is bounded by o. 


Any operator obtained from a suitable class of 
functions through Weyl's quantization is called a 
pseudodifferential operator. If o(q, p) — P(p), where 
P is a polynomial, G(p,q) is a differential operator. 

Moreover, if o(p,x)e€ L? then o(D,x) is a 
Hilbert-Schmidt operator and 


2012 
lo(D, x) has = (27h) "^2 | f c? dz 


Pseudodifferential operators turn out to be very 
important in particular in the quantum theory of 
molecules (Le Bris 2003), where adiabatic analysis 
and Peierls substitution rules force the use of 
pseudodifferential operators. 

The next important problem in the theory of 
quantization is related to dynamics. 

Let 8 be a quantization procedure and let H(p, q) 
be a classical Hamiltonian on phase space. Let A, be 


the evolution of a classical observable A under the 
flow defined by H and assume that B(A,) is well 
defined or all t. 

Is there a self-adjoint operator H such that 
B(A,) =e" 3(A)e-"H? If so, can one estimate 
|H — 8(H)|? Conversely, if the generator of the 
quantized flow is, by definition, H (as is usually 
assumed), is it possible to give an estimate of the 
difference |3(A;) — (G(A)),|@ for a dense set of o € 
H, where A, = er Ae", or to estimate |A; — A;|.., 
where A, is defined by 9(A;) — (8(A)),. Is it possible 
to write an asymptotic series in 5 for the differences? 

For the Weyl quantization. some quantitative 
results have been obtained if one makes use of the 
semiclassical observables (Robert 1987). We shall 
not elaborate further on this point. 

For completeness, we briefly mention another 
quantization procedure which is often used in 
mathematical physics. 


Wick Quantization 


This quantization assigns positive operators to 
positive functions, but does not preserve polynomial 
relations. It is strictly related to the Bargmann- 
Fock-Segal representation. 

Call coherent state centered in the point (y,7) of 
phase space the normalized solution of (ip 4-X— 
in + x)óy, 4(x) =0. 

Wick's quantization of the classical observable A 
is by definition the map A — Op" (A), where 


Op" (A)y = (2xb)"" | Aly, 2), Fyn) Oyun dy dn 


One can prove, either directly or going through 
Weyl’s representation, that 


1. if A> 0 then Op? (A) > 0; 
2. the Weyl symbol of the operator Op; (A) is 


(nb) " A(y, ne He HE") dy dy 


3. for every A'€ O(0) one has ||Op? (A) — All = 
O(b). 


Wick's quantization associates with every vector 
@ € H a positive Radon measure ju, in phase space, 
called Husimi measure. It is defined by f Adj, = 
(Op? (A) - v), A € S(z). Wick's quantization is less 
adapted to the treatment of nonrelativistic particles, 
in particular Eherenfest's rule does not apply, and 
the semiclassical propagation theorem has a more 
complicated formulation. It is very much used for 
the analysis in Fock space in the theory of quantized 
relativistic fields, where a special role is assigned to 
Wick ordering, according to which the polynomials 
in Xj and f, are reordered in terms of creation and 
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annihilation operators by placing all creation opera- 
tors to the left. 

We now come back to Schródinger's equation and 
notice that it can be derived within Heisenberg's 
formalism and Weyl’s quantization scheme from the 
Hamiltonian of an N-particle system in Hamiltonian 
mechanics (at least if one neglects spin, which has 
no classical analog). 

Apart from (often) inessential parameters, the 
Schródinger equation for N scalar particles in R? 
can be written as 


1 oe = y bv, is A) $ + V = Ho 
"Iit — [17] 


ġe L*(RóH) 


where A, are vector-valued functions (vector poten- 
tials) and V= V(x) + V;k(xi— xk) are scalar- 
valued function (scalar potentials) on R?. 

Typical problems in  Schrédinger’s 
mechanics are: 


quantum 


1. Self-adjointness of H, existence of bound states 
(discrete spectrum of the operator), their number 
and distribution, and, in general, the properties 
of the spectrum. 

2. Existence, completeness, and continuity proper- 
ties of the wave operators 


Wa. zs-— lim efe e7uH [18] 
Foc 
and the ensuing existence and properties of the 
S-matrix and of the scattering cross sections. In 
[18] Ho is a suitable reference operator, usually 
—A (with periodic boundary conditions if the 
potentials are periodic in space), for which 
Schrédinger’s equation can be somewhat analy- 
tically controlled. 
3. Existence and property of a semiclassical limit. 


In [17] and [18] we have implicitly assumed that H 
is time independent; very interesting problems arise 
when H depends on time, in particular if it is 
periodic or quasiperiodic in time, giving rise to 
ionization phenomena. In the periodic case, one is 
helped by Floquet's theory, but even in this case 
many interesting problems are still unsolved. 

If the potentials are sufficiently regular, the 
spectrum of H consists of an absolutely continuous 
part (made up of several bands in the space-periodic 
case) and a discrete part, with few accumulation 
points. 

On the contrary, if V(x,w) is a measurable 
function on some probability space Q, with a 
suitable distribution (e.g., Gaussian), the spectrum 
may have totally different properties almost surely. 
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For example, in the case N = 1 (so that the terms V; ; 
are absent) in one and two spatial dimensions the 
spectrum is pure point and dense, with eigenfunctions 
which decrease at infinity exponentially fast (although 
not uniformly); as a consequence, the evolution group 
does not give rise to a dispersive motion. The same is 
true in three dimensions if the potential is sufficiently 
strong and the kinetic energy content of the initial state 
is sufficiently limited. This very interesting behavior is 
due roughly to the randomness of the “barriers” 
generated by the potential and is also present, to a 
large extent, for potentials quasiperiodic in space 
(Pastur and Figotin 1292). 

In these as well.as in most problems related 
to Schródinger's equation, a crucial role is taken 
by the resolvent operator (H — AI) !, where A is 
any complex number outside the spectrum of H; 
many of the results are obtained when the difference 
(H — AI) ! — (Ho — M) ! is a compact operator. 

Problems of type (1) and (2) are of great physical 
interest, and are of course common with theoretical 
physics and quantum chemistry (Le Bris 2003), 
although the instruments of investigation are some- 
what different in mathematical physics. The semi- 
classical limit is often more of theoretical interest, 
but its analysis has relevance in quantum chemistry 
and its methods are very useful whenever it is 
convenient to use multiscale methods, as in the 
study of molecular spectra. 

We start with a brief description of point (3); it 
provides a valid instrument in the description of 
quantum-mechanical systems at a scale where it is 
convenient to use units in which the physical 
constant b has a very small value (b =~ 107 in 
CGS units). From Heisenberg's commutation rela- 
tions, [€, p] C bI, it follows that the product of the 
dispersion (uncertainty) of the position and momen- 
tum variables is proportional to 5 and therefore at 
least one of these two quantities must have very 
large values (compared to 5). One considers usually 
the case in which these dispersions have comparable 
values, which is therefore very small, of the order of 
magnitude 5! (but very large as compared with 5). 
In order to make connection with the Hamilton- 
Jacobi formalism of classical mechanics one can also 
consider the case in which the dispersion in 
momentum is of the order 5 (the WKB method). 

The semiclassical limit takes advantage mathema- 
tically from the fact that the parameter þh is very 
small in natural units, and performs an asymptotic 
analysis, in which the terms of “lowest order" are 
exactly described and the difference is estimated. 
The problem one faces is that the Schródinger 
equation becomes, in the “mathematical limit" 


b — 0, a very singular PDE (the coefficients of the 
differential terms go to zero in this limit). 

Dividing each term of the equation by 5 (because 
we do not want to change the scale of time) leads, in 
the case of one quantum particle in R? in potential 
field V(x) (we treat, for simplicity, only this case), to 
the equation 

en = —hAd(x,t) -- b^! V(x)ó(x, t) [19] 
It is convenient therefore to “rescale” the spatial 
variables by; a factor b'^ (ie., choose different 
units) setting x = VAX and look for solutions of [19] 
which remain regular in the limit 5 — 0 as functions 
of the rescaled variable X. One searches therefore 
for solutions that on the “physical scale" have 
support that becomes “vanishingly small" in the 
limit. It is therefore not surprising that, in the limit, 
these solutions may describe point particles; the 
main result of semiclassical analysis is that he 
coordinates of these particles obey Hamilton's laws 
of classical mechanics. 

This can be roughly seen as follows (accurate 
estimates are needed to make this empirical analysis 
precise). Using multiscale analysis, one may write the 
solution in the form (X,x,t) and seek solutions 
which are smooth in X and x. Both terms on the right- 
hand side of [19] contain contributions of order —2 
and —1 in Vb and in order to have regular solutions 
one must have cancellations between equally singular 
contributions. For this, one must perform an expan- 
sion to the second order of the potential (assumed at 
least twice differentiable) around a suitable trajectory 
q(t), q € R?, and choose this trajectory in such a way 
that the cancellations take place. 

A formal analysis shows that this is achieved only 
if the trajectory chosen is precisely a solution of the 
classical Lagrange equations. Of course, a more 
refined analysis and good estimates are needed to 
make this argument precise, and to estimate the 
error that is made when one neglects in the resulting 
equation terms of order Vb; in favorable cases, for 
each chosen T the error in the solution for most 
initial conditions of the type described is of order 
Vb for |t| « T. 

This semiclassical result is most easily visualized 
using the formalism of Wigner functions (the 
technical details, needed to to make into a proof 
the formal arguments, take advantage of regularity 
estimates in the theory of functions). 

In natural units, one defines 


iy £ 
W, (x, & f) = (5) W, (s.t) 


je Ņ eS >- OW. 


In terms of the Wigner function W; , the Schródin- 
ger equation [19] takes the form 
of” 


— ++ Vf? + Ky +f? =0 


Ot [20] 


where 


i 一 这 ,7 无 一】 by ho _ by 
Nb = Opn’ b v(x« 2 v(x 2)] 


It can be proved (Robert 1987) that if the potential 
is sufficiently regular and if the initial datum 
converges in a suitable topology to a positive 
measure fo, then, for all times, Ws ,(x, t) converges 
to a (weak) solution of the Liouville equation 

of 
— +E- Vxf — VV(x)- Vef =0 
Ot 
This leads to the semiclassical limit if, for example, 
one considers a sequence of initial data ps, where gd 
is a sequence of functions centered at xo with 
Fourier transform centered at po and dispersion of 
order b^ both in position and in momentum. In 
this case, the limit measure is a Dirac measure 
centered on the classical paths. 

In the course of the proof of the semiclassical limit 
theorem, one becomes aware of the special status of 
the Hamiltonians that are at most quadratic in x and 
p. Indeed, it is easy to verify that for these 
Hamiltonians the expectation values of x and p 
obey the classical equation of motion (P Ehrenfest 
rule). 

From the point of view of Heisenberg, this can be 
understood as a consequence of the fact that 
operators at most bilinear in a and a* form an 
algebra D under commutation and, moreover, the 
homogeneous part of order 2 is a closed subalgebra 
such that its action on D (by commutation) has the 
same structure as the algebra of generators of the 
Hamiltonian flow and its tangent flow. Apart from 
(important) technicalities, the proof of the semiclas- 
sical limit theorem reduces to the proof that one can 
estimate the contribution of the terms of order 
higher than 2 in the expansion of the quantum 
Hamiltonian at the classical trajectory as being of 
order h'’* in a suitable topology (Hepp 1974). 

We end this overview by giving a brief analysis of 
problems (1) and (2), which refer to the description 
of phenomena that are directly accessible to com- 
parison with experimental data, and therefore have 
been extensively studied in theoretical physics and 
quantum chemistry (Mc Weeny 1992); some of 
them have been analyzed with the instruments of 
mathematical physics, often with considerable 
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success. We give here a very naive introduction to 
these problems and refer the reader to the more 
specialized contributions to this Encyclopedia for a 
rigorous analysis and exact statements. 

Of course, most of the problems of physical 
interest are not “exactly solvable,” in the sense that 
rarely the final result is given explicitly in terms of 
simple functions. As a consequence, exact numerical 
results, to be compared with experimental data, are 
rarely obtained in physically relevant problems, and 
most often one has to rely on approximation 
schemes with (in favorable cases) precise estimates 
on the error. 

Formal perturbation theory is the easiest of such 
schemes, but it seldom gives reliable results to 
physically interesting problems. One writes 


H.=H+eV [21] 


where e is a small real parameter, and sets a formal 
scheme in case (1) by writing 


> AR; e 三 Sd, 
0 


0 


Fd. = Ech, Ex 


and, in case (2), iterating Duhamel’s formula 
4 è t . 
ete - e "Ho 4 « | e 5H. Ve—sHods [22] 
0 


Very seldom the perturbation series converges, and 
one has to resort to more refined procedures. 

In some cases, it turns out to be convenient to 
consider the formal primitive E, of E, (as a 
differentiable function of e) and prove that it is 
differentiable in e for 0 < e < eo (but not for e= 0). 


In favorable cases, this procedure may lead to 
N ; 
E. 》 € E, + Rn(e), lim |Rywl(e) = 十 co 
0 xum | 人 ) 


with explicit estimates of |Rx(e)| for 0 < e < eo. 

Re-summation techniques of the formal power 
series may be of help in some cases. 

The estimate of the lowest eigenvalues of an 
operator bounded below is often done by variational 
analysis, making use of min-max techniques applied 
to the quadratic form O(¢) = (o, Hà). 

Semiclassical analysis can be useful to search for 
the distribution of eigenvalues and in the study of 
the dynamics of states whose dispersions both in 
position and in momentum are very large in units in 
which 5 — 1. 

A case of particular interest in molecular and 
atomic physics occurs when the physical parameters 
which appear in H, (typically the masses of the 
particles involved in the process) are such that one 
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can a priori guess the presence of coordinates which 
have a rapid dependence on time (fast variables) and 
a complementary set of coordinates whose depen- 
dence on time is slow. This suggests that one can try 
an asymptotic analysis, often in connection with 
adiabatic techniques. Seldom one deals with cases in 
which the hypotheses of elementary adiabatic 
theorems are satisfied, and one has to refine the 
analysis, mostly through subtle estimates which 
ensure the existence of quasi invariant subspaces. 

Asymptotic techniques and refined estimates are 
also needed to study the effective description of a 
system of N interacting identical particles when N 
becomes very large; for example, in statistical 
mechanics, one searches for results which are valid 
when N 一 oc. 

The most spectacular results in this direction are 
the proof of stability of matter by E Lieb and 
collaborators, and the study of the phenomenon of 
Bose-Einstein condensation and the related Gross- 
Pitaevskii (nonlinear Schródinger) equation. The 
experimental discovery of the state of matter 
corresponding to a Bose-Einstein condensate is a 
clear evidence of the nonclassical behavior of matter 
even at a comparatively macroscopic size. From the 
point of view of mathematical physics, the ongoing 
research in this direction is very challenging. 

One should also recognize the increasing role that 
research in QM is taking in applications, also in 
connection with the increasing success of nanotech- 
nology. In this respect, from the point of view of 
mathematical physics, the study of nanostructure 
(quantum-mechanical systems constrained to very 
small regions of space or to~lower-dimensional 
manifolds, such as sheets or graphs) is still in its 
infancy and will require refined mathematical 
techniques and most likely entirely new ideas. 

Finally, one should stress the important role 
played by numerical analysis (Le Bris 2003) and 
especially computer simulations. In problems involv- 
ing very many particles, present-day analytical 
techniques provide at most qualitative estimates 
and in favorable cases bounds on the value of the 
quantities of interest. Approximation schemes are 
not always applicable and often are not reliable. . 

Hints for a progress in the mathematical treatment 
of some relevant physical phenomena of interest in 
QM (mostly in condensed matter physics) may come 
from the ab initio analysis made by simulations on 
large computers; this may provide a qualitative and, 
to a certain extent, quantitative behavior of the 
solutions of Schródinger's equation corresponding to 
"typical" initial conditions. In recent times the 
availability of more efficient computing tools has 
made computer simulation more reliable and more 


apt to concur with mathematical investigation to a 
fuller comprehension of QM. 


Interpretation Problems 


In this section we describe some of the conceptual 
problems that plague present-day QM and some of 
the attempts that have been made to cure these 
problems, either within its formalism or with an 
altogether different approach. 


Approaches within the QM Formalism 


We begin with the approaches "from within." We 
have pointed out that the main obstacle in the 
measurement problem is the description of what 
occurs during an act of measurement. Axiom III 
claims that it must be seen as a "destruction" act, 
and the outcome is to some extent random. The 
final state of the system is one of the eigenstates of 
the observable, and the dependence on the initial 
state is only through an a priori probability assign- 
ment; the act of measurement is therefore not a 
causal one, contrary to the (continuous) causal 
reversible description of the interaction with the 
environment. One should be able to distinguish 
a priori the acts of measurement from a generic 
interaction. 

There is a further difficulty. Due to the super- 
position principle, if a system S on which we want 
to make a measurement of the property associated 
with the operator A “interacts” with an instrument 
T described by the operator S, the final state € of the 
combined system will be a coherent superposition of 
tensor product of (normalized) eigenstates of the 
two systems 


E = > Can PR Q9 V, pP TN =] [23] 


n.n 


Measurement as described by Axiom III of QM 
claims that once the measurement is over, the 
measured system is, with probability 7, |c,, |^, in 
the state o^ and the instrument is in a state which 
carries the information about the final state of the 
system (after all, what one reads at the end is an 
indicator of the final state of the instrument). 
It is therefore convenient to write € in the form 


= duo; GO Ga; P ld, = j [24| 


(this defines C, if the spectrum of A is pure point and 
nondegenerate). It is seen from [24] that, due to the 
reduction postulate, we know that the the measured 
system is in the state d if a measurement of an 


no 
observable T with nondegenerate spectrum, 


eigenvectors {(,}, and eigenvalues {z,} gives the 
results z,,. 

Along these lines, one does not solve the measure- 
ment problem (the outcome is still probabilistic) but 
at least one can find the reason why the measuring 
apparatus may be considered “classical.” 

It is more convenient to go back to [23] and to 
assume that one is able to construct the measuring 
apparatus in such a way that one divides (roughly) 
its pure (microscopic) states in sets €, (each 
corresponding to a “macroscopic” state) which are 
(roughly) in one-to-one correspondence to the 
eigenstates of A. The sets ®, contain a very large 
number, Ns,, of elements, so that the sets ®, need 
not be given with extreme precision. And the sets ®,, 
must be in a sense "stable" under small external 
perturbations. 

It is clear from this rough description that the 
apparatus should contain a large number of small 
components and still its interaction with the “small” 
system A should lead to a more or less sudden 
change of the sets ®,,. 

A concrete model of this mechanism has been 
proposed by K Hepp (1972) for the case when A is a 
2 x 2 matrix, and the measuring apparatus is made 
of a chain of N spins, N — oo; the analysis was 
recently completed by Sewell (2005) with an 
estimate on the error which is made if N is finite 
but large. This is a dynamical model, in which the 
observable A (a spin) interacts with a chain of spins 
("moves over the spins") leaving the trace of its 
passage. It is this trace (final macroscopic state of 
the apparatus) which is measured and associated 
with the final state of A. The interaction is not 
"instantaneous" but may require a very short time, 
depending on the parameters used to describe the 
apparatus and the interaction. 

We call *decoherence" the weakening of the 
superposition principle due to the interaction with 
the environment. 

Two different models of decoherence have been 
analyzed in some detail; we shall denote them 
thermal-bath model and scattering model; both are 
dynamical models and both point to a solution, to 
various extents, of the problem of the reduction to a 
final density matrix which commutes with the 
operator A (and therefore to the suppression of the 
interference terms). 

The thermal-bath model makes use of the 
Heisenberg representation and relies on results of 
the theory of C*-algebras. This approach is closely 
linked with (quantum) statistical mechanics; its aim 
is to prove, after conditioning with respect to the 
degrees of freedom of the bath, that a special role 
emerges for a commuting set of operators of the 
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measured system, and these are the observables that 
specify the outcome of the measurement in prob- 
abilistic terms. 

The scattering approach relies on the Schródinger 
approach to QM, and on results from the theory of 
scattering. This approach describes the interaction of 
the system S (typically a heavy particle) with an 
environment made of a large number of light particles 
and seeks to describe the state of S after the 
interaction when one does not have any information 
on the final state of the light particle. One seeks to 
prove that the reduced density matrix is (almost) 
diagonal in a given representation (typically the one 
given by the spatial coordinates). This defines the 
observable (typically, position) that can be measured 
and the probability of each outcome. 

Both approaches rely on the loss of information in 
the process to cancel the effect of the superposition 
principle and to bring the measurement problem 
within the realm of classical probability theory. 
None of them provides a causal dependence of the 
result of the measurement on the initial state of the 
system. 

We describe only very briefly these attempts. 

In its more basic form, the “scattering approach” 
has as starting point the Schrödinger equation for a 
system of two particles, one of which has mass very 
much smaller than the other one. The heavy particle 
may be seen as representing the system on which a 
measurement is being made. The outline of the 
method of analysis (which in favorable cases can be 
made rigorous) (Joos and Zeh 1985, Tegmark 1993) 
is the following. One chooses units in which the 
mass of the heavy particle is 1, and one denotes by € 
the mass of the light particle. If x is the coordinate 
of the heavy particle and y that of the light one, and 
if the initial state of the system is denoted by 
a(x, y), the solution of the equation for the system 
is (apart from inessential factors) 


$, = exp{i(—A, — € A, + W(x) + V(x — y))t) o 


Making use of center-of-mass and relative coordi- 
nates, one sees that when e is very small one should 
be able to describe the system on two timescales, 
one fast (for the light particle) and one slow (for the 
heavy one) and, therefore, place oneself in a setting 
which may allow the use of adiabatic techniques. In 
this setting, for the measure of the heavy particle 
(e.g., its position) one may be allowed to consider 
the light particle in a scattering regime, and use the 
wave operator corresponding to a potential 
Vx(y) = V(y — x). 

Taking the partial trace with respect to the 
degrees of freedom of the light particle (this 
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corresponds to no information of its final state) one 
finds, at least heuristically, that the state of the 
heavy particle is now described (due to the trace 
operation) by a density matrix c for which in the 
coordinate representation the off-diagonal terms 
Ox,» are slightly suppressed by a factor £.. — 1 — 
(Ww, Wip) where v represents the initial state of 
the light particle and Wt is the wave operator for 
the motion of the light particle i in the potential «V,. 
One must assume that function ó which represents 
the initial state of the heavy particle is sufficiently 
localized so that 5. x <1 for every x’ Æx in its 
support. 

If the environmint is 
particles (their number N(c) must be such that 
lim, _.9 €N(ce) 2 oc) and the heavy particle can be 
supposed to have separate interactions with all of 
them, the off-diagonal elements of the density 
matrix tend to 0 as e— 0 and the resulting density 
matrix tends to have the form (x,x')—ó(x — x’) 
p(x), p(x) > 0, f p(x) dx —1. If it can be — 
that all interactions take place within a time T(e) < e? 
a > 0 one has p(x) = |vi(x)|?. 

If the interactions are not independent, the 
analysis becomes much more involved since it has 
to be treated by many-body scattering theory; this 
suggests that the scattering approach can be hardly 
used in the context of the *thermal-bath model." In 
any case, the selection of a "preferred basis" (the 
coordinate representation) depends on the fact that 
one is dealing with a scattering phenomenon. A few 
steps have been made for a rigorous analysis (Teta 
2004) but we are very far from a mathematically 
satisfactory answer. 

The thermal-bath approach has been studied 
within the algebraic formulation of QM and stands 
on good mathematical ground (Alicki 2002, 
Blanchard et al. 2003, Sewell 2005). Its drawback 
is that it is difficult to associate the formal scheme 
with actual physical situations and it is difficult to 
give a realistic estimate on the decoherence time. 

The thermal-bath approach attributes the deco- 
herence effect to the practical impossibility of 
distinguishing between a vast majority of the pure 
states of the systems and the corresponding statis- 
tical mixtures. In this approach, the observables are 
represented by self-adjoint elements of a weakly 
closed subalgebra M of all bounded operators B(H) 
on a Hilbert space H. This subalgebra may depend 
on the measuring apparatus (i.e, not all the 
apparatuses are fit to measure a set of observables). 

“classical” observable by definition commutes 
with all other observables and therefore must belong 
to the center of A which is isomorphic to a 
collection of functions on a probability space M. 


made of very many 


So the appearance of classical properties of a 
quantum system corresponds to the “emergence” of 
an algebra with nontrivial center. Since automorphic 
evolutions of an algebra preserve its center, this 
program can be achieved only if we admit the loss of 
quantum coherence, and this requires that the 
quantum systems we describe are open and interact 
with the environment, and moreover that the 
commutative algebra which emerges be stable for 
time evolution. 

It may be shown that one must consider quantum 
environment in the thermodynamic limit, that is, 
consider the interaction of the system to be 
measured with a thermal bath. A discussion of the 
possible emergence of classical observables and of 
the corresponding dynamics is given by Gell-Mann 
(1993). In all these approaches, the commutative 
subalgebra is selected by the specific form of the 
interaction; therefore, the measuring apparatus 
determines the algebra of classical observables. 

On the experimental side, a number of very 
interesting results have been obtained, using very 
refined techniques; these experiments usually also 
determine the *decoherence time." The experimental 
results, both for the collision model (Hornberger 
et al. 2003) and for the thermal-bath model 
(Hackermueller et al. 2004), are done mostly with 
fullerene (a molecule which is heavy enough and is 
not deflected too much after a collision with a 
particle of the gas). They show a reasonable 
accordance with the (rough) theoretical conclusions. 

The most refined experiments about decoherence 
are those connected with quantum optics (circularly 
polarized atoms in superconducting cavities). These 
are not related to the wave nature of the particles 
but in a sense to the “wave nature" of a photon as a 
single unit. The electromagnetic field is now 
regarded as an incoherent superposition of states 
with an arbitrarily large number of photons. 
Polarized photons can be produced one by one, 
and they retain their individuality and their polar- 
ization until each of them interacts with “the 
environment” (e.g., the boundary of the cavity or a 
particle of the gas). In a sense, these experimental 
results refer to a “decoherence by collision” theory. 

The experiments by Haroche (2003) prove that 
coherence may persist for a measurable interval of 
time and are the most controlled experiments on 
coherence so far. 


Other Approaches 


We end this section with a brief discussion of the 
problem of “hidden variables” and a presentation of 
an entirely different approach to QM, originated by 


D Bohm (1952) and put recently on firm mathema- 
tical grounds by Duerr et al. (1999). The approach is 
radically different from the traditional one and it is 
not clear at present whether it can give a solution to 
the measurement problem and a description of all 
the phenomena which traditional QM accounts for. 
But it is very interesting from the point of view of 
the mathematics involved. 

We have remarked that the formulation of QM 
that is summarized in the three axioms given earlier 
has many unsatisfactory aspects, mainly connected 
with the superposition principle (described in its 
extremal form by the Schródinger's cat *paradox") 
and with the problem of measurement which 
reveals, for example, through the Einstein-Rosen- 
Podolski *paradox," an intrinsic nonlocality if one 
maintains that their “objective” properties can be 
attributed to systems which are far apart. From the 
very beginning of QM, attempts have been made to 
attribute these features to the presence of “hidden 
variables"; the statistical nature of the predictions 
of QM is, from this point of view, due to the 
incompleteness of the parameters used to describe 
the systems. The impossibility of matching the 
statistical prediction of QM (confirmed by experi- 
mental findings) with a local theory based on hidden 
variables and classical probability theory has been 
known for sometime (Kochen and Specker 1967), 
also through the use of “Bell inequalities” (Bell 
1964) among correlations of outcomes of separate 
measurements performed on entangled system 
(mainly two photons or two spin-1/2 particles 
created in a suitable entangled state). 

A proof of the intrinsic nonlocality of QM (in the 
above sense) was given by L Hardy (see Haroche 
(2003)). 

While experimental results prove that one 
cannot substitute QM with a “naive” theory of 
hidden variables, more refined attempts may have 
success. We shall only discuss the approach of Bohm 
(following a previous attempt by de Broglie) as 
presented in Duerr et al. (1999). It is a dynamical 
theory in which representative points follow “classical 
paths" and their motion is governed by a time- 
dependent vector *velocity" field (in this sense, it is 
not Newtonian). In a sense, Bohmian mechanics is a 
minimal completion of QM if one wants to keep the 
position as primitive observable. To these primitive 
objects, Bohm's theory adds a complex-valued func- 
tion @ (the “guiding wave” in Bohm's terminology) 
defined on the configuration space Q of the particles. 
In the case of particles with spin, the function ó is 
spinor-valued. Dynamics is given by two equations: 
one for the coordinates of the particles and one for 
the guiding wave. If x 2 xj,...,xw describes the 
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configuration of the points, the dynamics in a 
potential field V(x) is described in the following 
way: for the wave ó by a nonrelativistic Schródinger 
equation with potential V and for the coordinates by 
the ordinary differential equation (ODE) 


x, c R? 


E EX (x), 


ó*ó 


where m, is the mass of the mth particle. 

Notice that the vector field is singular at the zeros 
of the wave function, therefore global existence and 
uniqueness must be proved. To see why Bohmian 
mechanics is empirically equivalent to QM, at least 
for measurement of position, notice that the 
equation for the points coincides with the continuity 
equation in OM. It follows that if one has at time 
zero a collection of points distributed with density 
lóol^, the density at time £ will be |ó(2)|^ where ¢(t) 
is the solution of the Schrédinger equation with 
initial datum ġo. 

Bohm (1952) formulated the theory as a modi- 
fication of Newton's laws (and in this form it has 
been widely used) through the introduction of a 
"quantum potential" Vo. This was achieved by 
writing the wave function in its polar form 
$= ReS/^ and writing the continuity equation as a 
modified Hamilton—Jacobi equation. The version of 
Bohm's theory discussed in Duerr et al. (1999) 
introduces only the guiding wave function and the 
coordinates of the points, and puts the theory on 
firm mathematical grounds. Through an impressive 
series of mathematical results, these authors and 
their collaborators deal with the completeness of 
the velocity vector field, the asymptotic behavior of 
the points trajectories (both for the scattering regime 
and for the trapped trajectories, which are shown to 
correspond to bound states in QM), with a rigorous 
analysis of the theorem on the flux across a surface 
(a cornerstone in scattering theory) and the detailed 
analysis of the “two-slit” experiment through a 
study of the interaction with the measuring appara- 
tus. The theory is completely causal, both for the 
trajectories of the points and for the time develop- 
ment of the pilot wave, and can also accommodate 
points with spin. It leads to a mathematically precise 
formulation of the semiclassical limit, and it may 
also resolve the measurement problem by relating 
the pilot wave of the entire system to its approximate 
decomposition in incoherent superposition of pilot 
wave associated with the particle and to the measur- 
ing apparatus (this would be the way to see the 
"collapse of the wave function” in QM). A weak 
point of this approach is the relation of the 
representative points with observable quantities. 
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Introduction 


This will be an elementary introduction to general 
topology. We shall not even touch upon algebraic 
topology, which will be dealt with in Cohomology 
Theories, although in some mathematics departments 
it is introduced in an advanced undergraduate course. 

We believe such an elementary article is useful for 
the encyclopaedia, purely for quick reference. Most 
of the concepts will be familiar to physicists, but 
usually in a general rather vague sense. This article 
will provide the rigorous definitions and results 
whenever they are needed when consulting other 
articles in the work. To make sure that this is the 
case, we have in fact experimentally tested the 
article on physicists for usefulness. 

Topology is very often described as *rubber-sheet 
geometry," that is, one is allowed to deform objects 
without actually breaking them. This is the all- 
important concept of continuity, which underlies 
most of what we shall study here. 

We shall give full definitions, state theorems 
rigorously, but shall not give any detailed proofs. 
On the other hand, we shall cite many examples, 
with a view to applications to mathematical physics, 
taking for granted that familiar more advanced 
concepts there need not be defined. By the same 
token, the choice of topics will also be so dictated. 


Essential Concepts 


Definition 1 Let X be a set. A collection 7 of 
subsets of X is called a topology if the following are 
satisfied: 


(i) OX ET. 


Introductory Article: Topology 131 


Wiener N (1938) The homogeneous chaos. American Journal of 
Mathematics 60: 897-936. 

Wigner EP (1952) Die Messung quantenmechanischer operatoren. 
Zeitschrift fur Physik 133: 101-108. 

Yafaev DR (1992) Mathematical scattering theory. Transactions 
of Mathematical Monographs. Providence, RI: American 
Mathematical Society. 

Zee HI (1970) On the interpretation of measurement in quantum 
theory. Foundations of Physics 1: 69-76. 

Zurek WH (1982) Environment induced superselection rules. 
Physical Reviews D 26(3): 1862-1880. 


(ii) Let Z be an index set. then 


A, € T,a € T — LJ, Aa €T 


acl 
li) Ape Tuis n— (Ce Ar eT. 


Definition 2 A member of the topology 7 is called 
an open set (of X with topology 7 ). 


Remark The last two properties are more easily 
put as arbitrary unions of open sets are open, and 
finite intersections of open sets are open. One can 
easily see the significance of this: if we take the 
“usual topology" (which will be defined in due 
course) of the real line, then the intersection of all 
open intervals (—1/n,1/m),m a positive integer, is 
just the single point [0], which is manifestly not 
open in the usual sense. 


Example If we postulate that Ø, and the entire set 
X, are the only open subsets, we get what is called 
the indiscrete or coarsest topology. At the other 
extreme, if we postulate that all subsets are open, 
then we get the discrete or finest topology. Both 
seem quite unnatural if we think in terms of the 
real line or plane, but in fact it would be more 
unnatural to explicitly exclude them from the 
definition. They prove to be quite useful in certain 
respects. 


Definition 3 A subset of X is closed if its 
complement in X is open. 


Remarks 


(1) One could easily build a topology using closed 
sets instead of open sets, because of the simple 
relation that the complement of a union is the 
intersection of the complements. 

(ii) From the definitions, there is nothing to prevent 
a set being both open and closed, or neither 
open nor closed. It is a common mistake to 
suppose otherwise. 
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Definition 4 A set equipped with a topology is 
called a topological space (with respect to the given 
topology). Elements of a topological space are 
sometimes called points. 


Definition 5 Let x € X. A neighborhood of x is a 
subset of X containing an open set which contains x. 


Remark This seems a clumsy definition, but turns 
out to be more useful in the general case than 
restricting to open neighborhoods, which is often done. 


Definition 6 A subcollection of open sets BCT is 
called a basis for the topology T if every open set is 
a union of sets of B. 


Definition 7 A subcollection of open sets S C T is 
called a sub-basis for the topology 7 if every open 
set is a union of finite intersections of sets of S. 


Definition 8 The closure A of a subset A of X is 
the smallest closed set containing A. 


Definition 9 The interior A of a subset A of X is 
the largest open set contained in A. 


Remark It is sometimes useful to define the 
boundary of A as the set A\A={x € A,x ¢ A}. 


Definition 10 Let A be a subset of a topological 
space X. A point x € X is called a limit point of A if 
every open set containing x contains some point of 
A other than x. 


Definition 11 A subset A of X is said to be dense in 
X if A= X. 


Definition 12 A topological space X is called a 
Hausdorff space if for any two distinct points x, y € X, 
there exist an open neighborhood of A of x and an 
open neighborhood B of y such that A and B are 
disjoint (that is, A N B — ()). 


Remark and Examples 


(i) This is looking more like what we expect. 
However, certain mildly non-Hausdorff spaces 
turn out to be quite useful, for example, in twistor 
theory. A “pocket” furnishes such an example. 
Explicitly, consider X to be the subset of the real 
plane consisting of the interval [—1, 1] on the x- 
axis, together with the interval [0,1] on the line 
y=1, where the following pairs of points are 
identified: (x, 0) & (x, 1),0 < x € 1. Then the two 
points (0, 0) and (0, 1) do not have any disjoint 
neighborhoods. Strictly speaking, one needs the 
notion of a quotient topology, introduced below. 

(ii) For a more “truly” non-Hausdorff topology, 
consider the space of positive integers N= 
{1,2,3,...}, and take as open sets the following: 
0, N, and the sets {1,2,...,”} for each n € N. 


This space is neither Hausdorff nor compact (see 
later for definition of compactness). 


Definition 13 Let X and Y be two topological 
spaces and let f : X — Y be a map from X to Y. We 
say that f is continuous if f !(A) is open (in X) 
whenever A is open (in Y). 


Remark Continuity is the single most important 
concept here. In this general setting, it looks a little 
different from the “e—6” definition, but this latter works 
only for metric spaces, which we shall come to shortly. 


Definition 14 A map f: X — Y is a homeomorph- 
ism if it is a continuous. bijective map such that its 
inverse f | is also continuous. 


Remark Homeomorphisms are the natural maps 
for topological spaces, in the sense that two home- 
omorphic spaces are "indistinguishable" from the 
point of view of topology. Topological invariants 
are properties of topological spaces which are 
preserved under homeomorphisms. 


Definition 15 Let B C A. Then one can define the 
relative topology of B by saying that a subset C C B 


is open if and only if there exists an open set D of A 
such that C— D n B. 


Definition 16 A subset B C A equipped with the 
relative topology is called a subspace of the 
topological space A. 


Remark Thus, if for subsets of the real line, we 
consider A = [0, 3], B = [0, 2], then C= (1,2] is open 
in B, in the relative topology induced by the usual 
topology of R. 


Definition 17 Given two topological spaces X and Y, 
we can define a product topological space Z — X x Y, 
where the set is the Cartesian product of the two sets X 
and Y, and sets of the form A x B, where A is open in 
X and B is open in Y, form a basis for the topology. 


Remark Note that the open sets of X x Y are not 
always of this product form (A x B). 


Definition 18 Suppose there is a partition of X into 
disjoint subsets Aa, œ € Z, for some index set Z, or 
equivalently, there is defined on X an equivalence 
relation ~. Then one can define the quotient 
topology on the set of equivalence classes {A,,a € 
T), usually denoted as the quotient space X/ ~ =Y, 
as follows. Consider the map 7: X — Y, called the 
canonical projection, which maps the element x € X 
to its equivalence class [x]. Then a subset U C Y is 
open if and only if 4^! (U) is open. 


Proposition 1 Let 7 be the quotient topology on 
the quotient space Y. Suppose T' is another 


topology on Y such that the canonical projection is 
continuous, then T' C T. 


Definition 19 An (open) cover {U,:a € T} for X isa 
collection of open sets Ua C X such that their union 
equals X. A subcover of this cover is then a subset of 
the collection which is itself a cover for X. 


Definition 20 A topological space X is said to be 
compact if every cover contains a finite subcover. 


Remark So for a compact space, however one 
chooses to cover it, it is always sufficient to use a 
finite number of open subsets. This is one of the 
essential differences between an open interval (not 
compact) and a closed interval (compact). The former 
is in fact homeomorphic to the entire real line. 


Definition 21 A topological space X is said to be 
connected if it cannot be written as the union of two 
nonempty disjoint open sets. 


Remark A useful equivalent definition is that any 
continuous map from X to the two-point set (0, 1], 
equipped with the discrete topology, cannot be 
surjective. 


Definition 22 Given two points x, y in a topolo- 
gical space X, a path from x to y is a continuous 
map /:[0,1] —^ X such that f(0)—x,f(1)— y. We 
also say that such a path joins x and y. 


Definition 23 A topological space X is path- 
connected if every two points in X can be joined 
by a path lying entirely in X. 


Proposition 2 A path-connected space is connected. 


Proposition 3 A connected open subspace of R” is 
patb-connected. 


Definition 24 Given a topological space X, define 
an equivalence relation by saying that x ~ y if and 
only if x and y belong to the same connected 
subspace of X. Then the equivalence classes are 
called (connected) components of X. 


Examples 


(i) The Lie group O(3) of 3 x 3 orthogonal matrices 
has two connected components. The identity 
connected component is SO(3) and is a subgroup. 

(ii) The proper orthochronous Lorentz transformations 
of Minkowski space form the identity component 
of the group of Lorentz transformations. 


Metric Spaces 


A special class of topological spaces plays an 
important role: metric spaces. 
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Definition 25 A metric space is a set X together 
with a function d: X x X — R satisfying 


(i) d(x, y) > 0, 
(n) díx,y)-—0 €» x, 
(iii) d(x,z) € d(x,y) + d(y, z) (“triangle inequality"). 


Remarks 


(i) The function d is called the metric, or distance 
function, between the two points. 

(ii) This concept of metric is what is generally 
known as “Euclidean” metric in mathematical 
physics. The distinguishing feature is the posi- 
tive definiteness (and the triangle inequality). 
One can, and does, introduce indefinite metrics 
(for example, the Minkowski metric) with 
various signatures. But these metrics are not 
usually used to induce topologies in the spaces 
concerned. 


Definition 26 Given a metric space X and a point 
x € X, we define the open ball centred at x with 
radius r (a positive real number) as 


B,(x) = {y E€ X : d(x,y) < rj 


Given a metric space X, we can immediately 
define a topology on it by taking all the open balls in 
X as a basis. We say that this is the topology 
induced by the given metric. Then we can recover 
our usual “e—6” definition of continuity. 


Proposition 4 Let f : X — Y bea map from the metric 
space X to tbe metric space Y. Then f is continuous 
(with respect to the corresponding induced topologies) 
at x € X if and only if given any e > 0,36 > 0 such that 
d(x,x') <6 implies d(f (x, ),f (x')) < e. 


Note that we do not bother to give two different 
symbols to the two metrics, as it is clear which 
spaces are involved. The proof is easily seen by 
taking the relevant balls as neighborhoods. Equally 
easy is the following: 


Proposition 5 A metric space is Hausdorff. 


Definition 27 A map f: X — Y of metric spaces is 
uniformly continuous if given any e > 0 there exists 
620 such that for any x1,x2 € X,d(xi,x2) « ó 
implies d(f (x1), f (x2)) < €. 

Remark Note the difference between continuity 


and uniform continuity: the latter is stronger and 
requires the same 6 for the whole space. 


Definition 28 Two metrics dı and d; defined on X 
are equivalent if there exist positive constants a and 
b such that for any two points x, y € X we have 


ad, (x, y) € d2(x,y) € bdi(x, y) 
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Remark This is clearly an equivalence relation. 
Two equivalent metrics induce the same topology. 


Examples 


(i) Given a set X, we can define the discrete metric 
as follows: do(x,y) — 1 whenever x Æ y. This 
induces the discrete topology on X. This is quite 
a convenient way of describing the discrete 
topology. 

(ii) In R, the usual metric is d(x, y) — |x — y|, and 
the usual topology is the one induced by this. 

(iii) More generally, in R", we can define a metric 
for every p> F by 


where x —(x1,x2, ==: ,Xn)sV¥=(V15V25-++s¥n)- In 
particular, for p —2 we have the usual Eucli- 
dean metric, but the other cases are also useful. 
To continue the series, one can define 


air max {|xk " vel} 
l<k<n 


All these metrics induce the same topology on R”. 
(iv) In a vector space V, say over the real or the 

complex field, a function || - ||: V — R* is called 

a norm if it satisfies the following axioms: 

(a) ||x|| 2 0 if and only if x — 0, 

(b) ||ax|| — |alllxll, and 

(c) lx+ yl] < lxll + Ily- 


Then it is easy to see that a metric can be defined 
using the norm 


d(x,y) = Ix — yll 


In many cases, for example, the metrics defined in 
example (iii) above, one can define the norm of a 
vector as just the distance of it from the origin. One 
obvious exception is the discrete metric. 

A slightly more general concept is found to be 
useful for spaces of functions and operators: that of 
seminorms. A seminorm is one which satisfies the 
last two of the conditions, but not necessarily the 
first, for a norm, as listed above. | 


Definition 29 Given a metric space X, a sequence 
of points {x;,x2,...} is called a Cauchy sequence if, 
given any € > 0, there exists a positive integer N 
such that for any k,£ > N we have d(xp, xe) < €. 


Definition 30 Given a sequence of points 
[x1,X2,...] in a metric space X, a point x € X is 
called a limit of the sequence if given any e€ > 0, 
there exists a positive integer N such that for any 
n>N we have d(x,x,) « e. We say that the 
sequence converges to x. 


Definition 31 A metric space X is complete if every 
Cauchy sequence in X converges to a limit in it. 


Examples 


(i) The closed interval [0,1] on the real line is 
complete, whereas the open interval (0,1) is 
not. For example, the Cauchy sequence 
(1/2,n—2,3,...] has no limit in this open 
interval. (Considered as a sequence on the real 
line, it has of course the limit point 0.) 

(ii) The spaces R" are complete. 

(iii) The Hilbert space Æ consisting of all 
sequences of real numbers (x;,x;,...] such 
that $^; x; converges is complete with respect 
to the obvious metric which is a generalization 
to infinite dimension of d above. For arbi- 
trary p > 1, one can similarly define £’, which 
are also complete and are hence Banach 
spaces. 


Remarks Completeness is not a topological invar- 
iant. For example, the open interval (— 1, 1) and the 
whole real line are homeomorphic (with respect to 
the usual topologies) but the former is not complete 
while the latter is. The homeomorphism can 
conveniently be given in terms of the trigonometric 
function tangent. 


Definition 32 A subset B of the metric space X is 
bounded if there exists a ball of radius R (R > 0) 
which contains it entirely. 


Theorem 1 (Heine-Borel) Azy closed bounded 
subset of R” is compact. 


Remark The converse is also true. We have thus a 
nice characterization of compact subsets of R" as 
being closed and bounded. 


Proposition 6 Any bounded sequence in R" has a 
convergent subsequence. 


Definition 33 Consider a sequence [f,] of real- 
valued functions on a subset A (usually an interval) 
of R. We say that (f,] converges pointwise in A if 
the sequence of real numbers [f,(x)) converges for 
every x € A. We can then define a function f: A— R 
by f(x) = lim, oo f, (x), and write f, — f. 


Definition 34 A sequence of functions f,:A— 
R,ACR is said to converge uniformly to a function 
f: A— R if given any e > 0, there exists a positive 
integer N such that, for all x, |f,(x) — f(x)| <€ 
whenever n >N. 


Theorem 2 Let f,:(a,b)—R be a sequence of 
functions continuous at the point c € (a,b), and 
suppose fn converges uniformly to f on (a,b). Then f 
is continuous at c. 


Remark and Example The pointwise limit of 
continuous functions need not be continuous, as 
can be shown by the following example: 
f(x) =x",x € [0,1]. We see that the limit function 
f is not continuous: 


r= ih 


Definition 35 Let X be a metric space. A map 
[:X-— X is a contraction if there exists c < 1 such 
that d(f (x), f(y)) € cd(x, y) for all x,y € X. 


Theorem 3 (Banach) If X is a complete metric 
space and f is a contraction in X, then f bas a unique 
fixed point x € X, that is, f(x) ^ x. 


Some Function and Operator Spaces 


The spaces of functions and operators can be 
equipped with different topologies, given by various 
concepts of convergence and of norms (or sometimes 
seminorms), very often with different such concepts 
for the same space. As we saw earlier, a norm in a 
vector space gives rise to a metric, and hence to a 
topology. Similarly with the concept of convergence 
for sequences of functions and operators, as one 
then knows what the limit points, and hence closed 
sets, are. 

But before we do that, let us introduce, in a 
slightly different context, a topology which is in 
some sense the natural one for the space of 
continuous maps from one space to another. 


Definition 36 Consider a family F of maps from a 
topological space X to a topological space Y, and 
define W(K,U)=(f:f € F,f(K) C UJ. Then the 
family of all sets of the form W(K,U) with K 
compact (in X) and U open (in Y) form a sub-basis 
for the compact open topology for F. 


Consider a topological space X and sequences of 
functions (f,) on it. Let D C X. We can then define 
pointwise convergence and uniform convergence 
exactly as for functions on subsets of the real line. 


Definition 37 Let X, D and (f,) as above. 


(1) The functions f, converge pointwise on D to a 
function f if the sequence of numbers 
fn(x) — f(x), Vx € D. 

(ii) The functions f„ converge uniformly on D to a 
function f if given € > 0, there exists N such that 
for all n > N we have |f,(x) — f(x)| < e Vx € D. 


Next we consider the Lebesgue spaces LP, that 
is, functions f defined on subsets of R", such 
that |f(x)" is Lebesgue integrable, for real 
numbers p 1. To define these spaces, we tacitly 
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take equivalence classes of functions which are equal 
almost everywhere (that is, up to a null set), but very 
often we can take representatives of these classes 
and just deal with genuine functions instead. Note 
that of all L^, only L? is a Hilbert space. 


Definition 38 In the space L^, we define its norm by 


ifi = (f FP ax) i 


Now we turn to general normed spaces, and 
operators on them. 


Definition 39 Convergence in the norm is also 
called strong convergence. In other words, a 
sequence (x,) in a normed space X is said to 
converge strongly to x if 


lim ||x, — x|| = 0 
n—o0 


Definition 40 A sequence (x,) in a normed space X 
is said to converge weakly to x if 


lim f (Xn) = f(x) 
for all bounded linear functionals f. 


Consider the space B(X,Y) of bounded linear 
operators T from X to Y. We can make this into a 
normed space by defining the following norm: 

IT| - sup xl 
x €X, ||x|| 2 1 
Then we can define three different concepts of 
convergence on B(X;Y). There are in fact more in 
current use in functional analysis. 


Definition 41 Let X and Y be normed spaces and 
let (T„) be a sequence of operators T,, € B(X, Y). 


(i) (Ta) is uniformly convergent if it converges in 
the norm. 
(ii) (Ta) is strongly convergent if (T,x) converges 
strongly for every x € X. 
(iii) (T4) is weakly convergent if (T,x) converges 
weakly for every x € X. 


Remark Clearly we have: uniform convergence => 
strong convergence —> weak convergence, and the 
limits are the same in all three cases. However, the 
converses are in general not true. 


Homotopy Groups 


The most elementary and obvious property of a 
topological space X is the number of connected 
components it has. The next such property, in a 
certain sense, is the number of holes X has. There 
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are higher analogues of these, called the homotopy 
groups, which are topological invariants, that is, 
they are invariant under homeomorphisms. They 
play important roles in many topological considera- 
tions in field theory and other topics of mathema- 
tical physics. The articles Topological Defects 
and Their Homotopy Classification and Electric- 
Magnetic Duality contain some examples. 


Definition 42 Given a topological space X, the 
zeroth homotopy set, denoted xo(X), is the set of 
connected components of X. One sometimes writes 
no(X)=0 if X is connected. 


To define the fundamental group of X, or TI(X)， 
we shall need the concept of closed loops, which we 
shall find useful in other ways too. For simplicity, 
we shall consider based loops (that is, loops passing 
through a fixed point in X). It seems that in most 
applications, these are the relevant ones. One could 
consider loops of various smoothness (when X is a 
manifold), but in view of applications to quantum 
field theory, we shall consider continuous loops, 
which are also the ones relevant for topology. 


Definition 43 Given a topological space X and a 
point x9 € X, a (closed) (based) loop is a continuous 
function of the parametrized circle to X: 


E: [0,27] + X 
satisfying £(0) = €(27) = xo. 


Definition 44 Given a connected topological space 
X and a point xp € X, the space of all closed based 
loops is called the (parametrized based) loop space 
of X, denoted QX. 


Remarks 


(i) The loop space QX inherits the relative compact- 
open topology from the space of continuous maps 
from the closed interval [0, 27] to X. It also has a 
natural base point: the constant function mapping 
all of [0,27] to xo. Hence it is easy to iterate the 
construction and define O^ X, & > 1. 

(ii) Here we have chosen to parametrize the circle 
by [0,27], as is more natural if we think in 
terms of the phase angle. We could easily have 
chosen the unit interval [0,1] instead. This 
would perhaps harmonize better with our pre- 
vious definition of paths and the definitions of 
homotopies below. 


Proposition 7 The fundamental group of a topo- 
logical space X, denoted n1(X), consists of classes of 
closed loops in X which cannot be continuously 
deformed into one another while preserving the base 
point. 


Definition 45 A space X is called simply connected 
if 74(X) is trivial. 


To define the higher homotopy groups, let us go 
into a little detail about homotopy. 


Definition 46 Given two topological spaces X and 
Y, and maps 


pgq:X—Y 
we say that b is a homotopy between the maps p,q if 
b:XxIY 


is a continuous map such that h(x,0)=p(x), 
b(x,1)— q(x), where I is the unit interval [0,1]. In 
this case, we write p ~ q. 


Definition 47 A map f:X-— Y is a homotopy 
equivalence if there exists a map g:Y — X such 
that go f œ idx and f og c idy. 


Remark This is an equivalence relation. 


Definition 48 For a topological space X with base 
point xo, we define 7,(X),n » 0 as the set of 
homotopy equivalence classes of based maps from 
the z-sphere S” to X. 


Remark This coincides with the previous defini- 
tions for To and 71. 


There is a very nice relation between homotopy 
classes and loop spaces. 


Proposition 8 7,(X)— 7, 41(QX)— --- —mo(Q" X). 


Remarks 


(1) When we consider the gauge group G in a Yang- 
Mills theory, its fundamental group classifies the 
monopoles that can occur in the theory. 

(ii) For n > 1,7,(X) is a group, the group action 
coming from the joining of two loops together 
to form a new loop. On the other hand, zo(X) 
in general is not a group. However, when X is a 
Lie group, then To(X) inherits a group structure 
from X, because it can be identified with the 
quotient group of X by its identity-connected 
component. For example, the two components 
of O(3) can be identified with the two elements 
of the group Z5, the component where the 
determinant equals 1 corresponding to 0 in Z2 
and the component where the determinant 
equals —1 corresponding to 1 in Zz. 

For n > 2, the group 7z,(X) is always abelian. 

Examples of nonabelian 7, are the fundamental 

groups of some Riemann surfaces. 

(v) Since 7, is not necessarily abelian, much of the 
direct-sum notation we use for the homotopy 


— 


(iii 
(iv 


= 


groups should more correctly be written multi- 
plicatively. However, in most literature in 
mathematical physics, the additive notation 
seems to be preferred. 


Examples 


(1) 74(X x Y) 7,(X) + 7,(Y), n > 1. 
(ii) For the spheres, we have the following results: 


n(S") - [7 ifi>n 
i tien 

n;(S') = 0 ifi>1 
Tax. ) = Zp ifm > 3 
Ta42(8" ) = Za ifn >2 


(iii) From the theory of sphere bundles, we can 
deduce: 


m;(S*) = mj-1(S') + m(S)) if i> 2 
mi(S*) = nj 4(S))--m;(S7) ifi>2 
mi(S°) = v; A(S7) + mj(S9) if i> 2 


and the first of these relations give the follow- 

ing more succinct result: 
mil) =m (S) if i>3 

(iv) A result of Serre says that all the homotopy 


groups of spheres are in fact finite except 7,,(S”) 
and 74,4 1(S?"), n > 1. 


Definition 49 Given a connected space X, a map 
1:B— X is called a covering if (i) 7(B) = X, and (ii) for 
each x € X, there exists an open connected neighbor- 
hood V of x such that each component of 7^ (V) is open 
in B, and restricted to each component is a home- 
omorphism. The space B is called a covering space. 


Examples 


(1) The real line R is a covering of the group U(1). 

(ii) The group SU(2) is a double cover of the group 
SO(3). 

(iii) The group SL(2, C) is a double cover of the 
Lorentz group SO(1, 3). 

(iv) The group SU(2,2) is a 4-fold cover of the 
conformal group in four dimensions. This local 
isomorphism is of great importance in twistor 
theory. 


Remarks 


(i) By considering closed loops in X and their 
coverings in B it is easily seen that the 
fundamental group mı(X) acts on the coverings 
of X. If we further assume that the action is 
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transitive, then we have the following nice 
result: coverings of X are in 1-1 correspon- 
dence with normal subgroups of 7(X). 

(ii) Given a connected space X, there always exists a 
unique connected simply connected covering space 
X, called the universal covering space. Further- 
more, X covers all the other covering spaces of X. 
For the higher homotopy groups, one has 


~w 


Ka) =A H2 


One very important class of homotopy groups are 
those of Lie groups. To simplify matters, we shall 
consider only connected groups, that is, 79(G) — 0. 
Also we shall deal mainly with the classical groups, 
and in particular, the orthogonal and unitary groups. 


Proposition 9 Suppose that G is a connected Lie 
group. 


(i) If G is compact and semi-simple, then mi(G) is 
finite. This implies that G is still compact. 
(n) v2(G) — 0. 
(iii) For G compact, simple, and  nonabelian, 
13(G)=Z. 
(iv) For G compact, simply connected, and simple, 
m4(G)=0 or Zo. 


Examples 


(i) m(SU(m)) = 0. 

(ii) 71(SO(n)) = Za. 

(iii) Since the unitary groups U(z) are topologically 
the product of SU(m) with a circle $!, their 
homotopy groups are easily computed using the 
product formula. We remind ourselves that 
U(1) is topologically a circle and SU(2) topolo- 
gically S?. 

(iv) For į > 2, we have: 

m(SO(3)) = n;(SU(2)) 

mi(SO(5)) = ni(Sp(2)) 

ri(SO(6)) = i(SU(4)) 
Just for interest, and to show the richness of the 
subject, some isomorphisms for homotopy groups 


are shown in Table 1 and some homotopy groups 
for low SU(m) and SO(n) are listed in Table 2. 


Table 1 Some isomorphisms for homotopy groups 


Isomorphism Hange 
z;(SO(n)) = v;(SO(m)) nm>i+2 
ri(SU(m)) = «;(SU(m)) n,m > (i +1) 
Ti(Sp(m)) = v;(Sp(m)) n,m 2 (i — 1) 
ni(G2) = 1((SO(7)) 2«i«5 
ni(Fa) = v,(SO(9)) 2<i<6 
mj(SO(9)) = x;(SO(7)) l=13 
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Table 2 Some homotopy groups for low SU(n) and SO(n) 


T4 Ts Te 77 
SU(2) Zo Z2 Z4 Z2 
SU(3) 0 Z Ze 0 
SU(4) 0 Z 0 Z 
SU(5) 0 又 0 Z 
SU(6) 0 Z 0 Z 
SO(5) Zo Za 0 Z 
SO(6) 0 Z 0 Z 
SO(7) 0 0 0 Z 
SO(8) 0 0 0 Z+Z, 
SO(9) 0 0 0 Z 
SO(10) 0 0 0 Z 


Tg Tg T10 

Zo Z3 Z45 

Z42 Z3 Z30 
2,24 Z2 Z120 + Z2 

0 Z Z120 

0 Z, Za 

0 0 Z420 
Z4 Zo Z120 + Z2 

Z2 + Z2 Z2 + Z2 Z4 
Zo + Zo + Ze Zo + Zo + Zo Zo4 + Zo4 

Z2 + Zo Z2 + Z2 724 

Z2 Z + Z2 Z2 


| ` 


Appendix: A Mathematician’s 
Basic Toolkit 


The following is a drastically condensed list, most 
of which is what a mathematics undergraduate 
learns in the first few weeks. The rest is included 
for easy reference. These notations and concepts 
are used universally in mathematical writing. We 
have not endeavored to arrange the material in a 
logical order. Furthermore, given structures such as 
sets, groups, etc., one can usually define “substruc- 
tures” such as subsets, subgroups, etc., in a 
straightforward manner. We shall therefore not 
spell this out. 


Sets 


AUB={x:xcAorxeB} union 
AnB-íx:xeAandxc€ B] intersection 

AMB = (x:x eA and x ¢ B} complement 
Ax B = {(x,y):x € A,y E€ B} Cartesian product 


Maps 


1. A map or mapping f :A—B is an assignment of 
an element f(x) of B for every x € A. | 

2. A map f:A—B is injective if f(x)—f(y) 
=> x =y. This is sometimes called a 1-1 map,.a 
term to be avoided. 

3. A map f:A — B is surjective if for every y € B 
there exists an x € A such that y=f(x). This is 
sometimes called an *onto" map. 

4. A map f : A— B is bijective if it is both surjective 
and injective. This is also sometimes called a 1-1 
map, a term to be equally avoided. 

5. For any map f : A — B and any subset C C B, the 
inverse image f ^! (C) = (x: f(x) € C] C A is always 
defined, although, of course, it can be empty. On 


the other hand, the map f! is defined if and only 
if f is bijective. 

6. A map from a set to either the real or complex 
numbers is usually called a function. 

7. A map between vector spaces, and more particu- 
larly normed spaces (including Hilbert spaces), is 
called an operator. Most often, one considers 
linear operators. 

8. An operator from a vector space to its field of 
scalars is called a functional. Again, one con- 
siders almost exclusively linear functionals. 


Relations 


1. A relation ~ on a set A is a subset R C A x A. 
We say that x ~ y if (x, y) € R. 

2. We shall only be interested in equivalence relations. 
An equivalence relation ~ is one satisfying, for all 
X,9,2 € Á: 

(a) x ^ x (*reflexive"), 
(b) x ~ y= y ~ x (“symmetric”), 
(c) x ~ y, y ~ z= x z (“transitive”). 

3. If ~ is an equivalence relation in A, then for each 
x € A, we can define its equivalence class: 


[x] ={y E Ary ~ x} 


It can be shown that equivalence classes are 
nonempty, any two equivalence classes are either 
equal or disjoint, and they together partition the set 
A. Subgroup equivalence classes are called cosets. 

4. An element of an equivalence class is called a 
representative. 


Groups 


A group is a set G with a map, called multiplication 
or group law 


G x G— G 
(x, y) — xy 
satisfying 


1. (xy)z 2 x(yz), Vx, y, z € G ("associative"); 
2. there exists a neutral element (or identity) 1 such 
that lx ^ x1— x, Vx € G; and 
3. every element x € G has an inverse x !, that is, 
i ak a N 
A map such as the multiplication in the definition 
is an example of a binary operation. Note that we 
have denoted the group law as multiplication here. 
It is usual to denote it additively if the group is 
abelian, that is, if xy = yx, Vx, y € G. In this case, we 
may write the condition as x + y — y 4- x, and call 
the identity element 0. 


A ring is a set R equipped with two binary 
operations, x+y called addition, and xy called 
multiplication, such that 


1. R is an abelian group under addition; 

2. the multiplication is associative; and 

3. (x+y)Z=xz+ yz, x(y + z) =xy + xz, Vx, y,z E R 
(“distributive”). 


If the multiplication is commutative (xy = yx) then 
the ring is said to be commutative. A ring may 
contain a multiplicative identity, in which case it is 
called a ring with unit element. 

An ideal I of R is a subring of R, satisfying in 
addition 


rcR,aelI—»raclLarel 


One can define in an obvious fashion a left-ideal and 
a right- ideal. The above definition will then be for a 
two-sided ideal. 


Modules 


Given a ring R, an R-module is an abelian group M, 
together with an operation, M x R— M, denoted 
multiplicatively, satisfying, for. x,y € M,r,s € R, 


1. (x --y)r-xr 4t yr, 
2. x(r--s) — xr + xs, 
3. x(rs) — (xr)s, and 
4. 


xl-ex 


The term right R-module is sometimes used, to 
distinguish it from obviously defined left R-modules. 
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Fields 


A field F is a commutative ring in which every 
nonzero element is invertible. 

The additive identity 0 is never invertible, unless 
0 — 1, so it is usual to assume that a field has at least 
two elements, 0 and 1. 

The most common fields we come across are, of 
course, the number fields: the rationals, the reals, 
and the complex numbers. 


Vector Spaces 


A vector space, or sometimes linear space, V, over a 
field F, is an abelian group, written additively, with 
a map F x V — V such that, for x, y € V,o, 8 EF, 


1. a(x + y) 2 ax + ay (“linearity”), 
2. (a+ B)x — ax + Bx, 

3. (a B)x — a(B8x), and 

4. 1x x. 


A vector space is then a right (or left) F-module. 
The elements of V are called vectors, and those of F 
scalars. 


Algebras 


An algebra A over a field F is a ring which is a 
vector space over F, such that 


a(ab) = (aa)b = a(ab), o €F, a,bcA 


Note that in some older literature, particularly the 
Russian school, an algebra of operators is called a 
ring of operators. 
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Introduction 


Quantum electrodynamics is the theory of the 
electromagnetic interactions of photons and elec- 
trons. When attempting to generalize this theory to 
other interactions it turns out to be necessary to 
identify its essential components. The essential 
properties of electrodynamics are contained in its 
formulation as an “abelian gauge theory." The 
generalization to include other interactions is then 
reduced to incorporating the structure of nonabelian 
groups. This becomes particularly clear when we 
formulate the theory in the language of differential 
forms. 

Here we first present the formulation of electro- 
dynamics using differential forms. The electromag- 
netic fields are introduced via the Lorentz force 
equation. They are recognized as the components of 
a differential 2-form. This form fulfills two differ- 
ential conditions, which are equivalent to Maxwell's 
equations. These are expressed with the help of a 
differential operator and its Hermitian conjugate, 
the codifferential operator. We consider the effects 
of charge conservation and introduce electromag- 
netic potentials, which are defined up to gauge 
transformations. We finally consider Weyl's argu- 
ment for the existence of the electromagnetic 
interaction as a consequence of the local phase 
invariance of the electron wave function. 

We then go on to present the nonabelian general- 
ization. The gauge bosons appear in a theory with 
fermions by requiring invariance of the theory with 
respect to local gauge transformations. When the 
fermions group into symmetry multiplets this gives 
rise to a gauge group SU(N) involving N2—1 gauge 
bosons mediating the interaction, where N is the 
dimension of the Lie algebra. The interaction arises 
through the necessity of replacing the usual deriva- 
tives by covariant derivatives, which transform in a 
natural way in order to preserve the gauge 


invariance. The covariant derivatives involve the 
gauge potentials, whose transformation properties 
are dictated by those of the covariant derivative. 
Whereas for an abelian gauge theory such as 
electromagnetism scalar-valued p-forms are suffi- 
cient (actually only p —1,2), a nonabelian gauge 
theory involves the use of Lie-algebra-valued 
p-forms. These are introduced and used to construct 
the Yang-Mills action, which involves the field 
strength tensor which is determined from the gauge 
potentials. This action leads to the Yang-Mills 
equations for the gauge potentials, which are the 
nonabelian generalizations of the Maxwell equations. 


Relativistic Kinematics 


The trajectory of a mass point is described as x"^(7), 
where 7 is the invariant proper time interval: 


dr? = d? — dx: dx = dt? (1 — 9?) [1] 


with v= dx/dt. With the abbreviation ^; = (1 — y»? 
this yields dr = (1/^)dt. 

The 4-velocity of a point is defined as u” = 
dx" /dr — ^(dx" /dt). The quantity 


u^ exar unu = à =] [2] 
is a relativistic invariant. Here 
1 0 0 0 
sw=| o 0-1 o B 
0 0 0 -1 


is the metric of Minkowski space. 
The 4-momentum of a particle is p" — mou" = 
(moy, moyv), and p“p, — més. The 4-force is 


, dp dp (dp? 
with the 3-force 
- d(mo^yv) 
d 5 
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Differentiate p? =m with respect to r, this yields 


0 
pf =m (fo) e 
dt 
or 
dp? | _ p dk 
pne f-v=f- P^ [7] 


This says that 
dp? = f -dx=dW [8] 


where W is the work done and p? is the energy. 
For a charged particle, the Lorentz force is 


f =q(E+vx B) [9] 


where q is the charge of the particle, E is the electric, 
and B the magnetic field strength. Since f :v= qE - v, 
we have the four-dimensional form of the Lorentz 
force: 


f" = qy(E-v,E+v x B) [10] 


The Lorentz Force Equation with 
Differential Forms 


We write the Lorentz force equation as an equation 
for a differential form f =f,,dx", with fu = g,,f". The 
velocity-dependent Lorentz force is 


f = —qi,F [11] 


with 


O o ð ,0 
ý (s^ ax y" x) 14 


the 4-velocity and F the electromagnetic field 
strength: 


F=EAdt+B [13] 
where € is a 1-form in three dimensions, 
E = E,dx + Eydy + Edz [14] 
and B is a 2-form in three dimensions, 
B = Bydy ^ dz + Bydz ^ dx + B,dx ^ dy [15] 


The symbol i, indicates a contraction of a 2-form 
with a vector, which is defined as 


i,F(v) P F(u, v) [16] 


for an arbitrary vector v. The contraction of a 
2-form with a vector yields a 1-form. 

It is easily seen that a 2-form can be expressed in 
terms of a polar vector and an axial vector: if it is to 
be invariant with respect to parity transformations 
with 


t t, X 一 -—x, y -—y, z—-z [17] 
the fields in eqn [13] must transform as 
E--E, BB 18] 
Now we check the validity of eqn [11]. We have 
f 一 一 gx 
= qy(v-E)dt — q-y(E* + (v x B)*)dx 


+ (E” + (v x BY )dy + (E? +(vxB))dz]| [19] 


in agreement with eqn [10]. We remember to change 
the signs in E; = —E*, B, = —B*, etc. 


The Codifferential Operator 


The space of p-forms on an n-dimensional manifold 


Is an 
ni _ n E n! 
(el-la ea M 


dimensional vector space. The space of p-forms is 
thus isomorphic to the space of (n — p)-forms. The 
Hodge dual operator maps the p-forms into the 
(n — p)-forms, and is defined by 


aN * B — (a, B dx! ^--- ^dx" [21] 
Here (a, 8) is the scalar product of two p-forms: 
(a, B) = af, p" "si [22] 


where ai -si are the coefficients of the form a, 


p 
o = oj i, dx" A--- A dx^ [23] 
Dj, si are the coefficients of the form 5, 

B = Bj, .. dx” ^-^ dx? [24] 
and 
gh ie = ghh... pining. j, [25] 


The indices satisfy ij <--+ < ip and ji<::…: < jy. 
The basis elements are orthogonal with respect to 
this scalar product, and 


(dx^ ^. A dx? dx" A ^ dx) 
= gih" Bipi 26] 
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The Hodge dual has the property that 
" (ae M. sa ^ dx”) ) 
= = &e(1 )c(1) `" Se(p)a(p) (sign c) 


x (der AS ok dx") [27] 


where c is a permutation of the indices (1,...,7), 
c(1) € --- € o(p), and o(p + 1) € --- < o(n). We also 
have 


* (dno AeA dx” ) 


(p41) * *" Bo(no(ny(— 1)?" P (signo) 
x (dx ! A*Adx? e [28] 


= Ba(p4-1)e 


We therefore find that the application of the 
Hodge dual to a p-form twice yields 
(dx?) A.--Adx™)) 
= 8o(1)o(1) ***So(p)a(p) (Signa) * - D A. ei 
= £o(1yo(1) ** Bolman 1) Pda A... dx" [29] 
Or 
wx = (7 1)^ 79 C 1) £g [30] 


where Ind g is the number of times (一 1) occurs along 
the diagonal of g. 

Now let a be a (p — 1)-form, and 8 a p-form. 
Then d * B is an (n — p + 1)-form, and 


d(a\ * B) —da^ *3+(—1)? l'a Ads 


— da^ * B4 (—1)9-P (1) Pred 
x (—1) 456 A (ex)d * 8 
— da^ x B+ (1) D (—1)nde 
X Q Ax (*d * 3) [31] 


We then have 
(da, B) = (adp = 人 d(aA#8) [32] 
with 
d* = —(-1y*-0 C 1yhnée xd x [33] 
We are here using the scalar product of two p-forms 
(a,8)= | (a^ 34 


With the help of Stokes’ theorem the last integral in 
eqn [32] may be turned into a surface term at 
infinity, which vanishes for a and 8 with compact 
support. d* is the adjoint operator to d with respect 


to the scalar product (,). Whereas the differential 
operator d maps p-forms into (p+ 1)-forms, the 
codifferential operator d* maps p-forms into (p — 1)- 
forms. 


The relation d? — 0 leads to 


(d*)* ox («d)(«d«) x *d** = 0 [35] 


This fact plays an essential role in connection with 
the conservation laws. 
Finally, we want to obtain a coordinate expres- 


sion for d*8. Indeed d* 8 = —Div 9 for 


_ Of 
Ox! 


where K is the multi-index of the coeffecients in 
B = Bxdx*, and K indicates that K = (k1, . . ., kp) is in 
the order ki <---<k,y. We wil show that 
(a, d*3)=(a, —Div8) for an arbitrary (p — 1)-form 
a. It is a fact that 


(Div8)y = [36] 


(a,d°8) = (da,8) = [(da) «1 — (87 


Now we have the coordinate expressions 


da = (dar) ^ dx- [38] 
and (dx), = 6h. It follows that 
(da), = (daz ^A dx! = oF x ók [39] 
or 
K OK 
(da), = JES 40 
Here we use 
(aA B), = ô ax By [41] 
where 
1 if (KL) is an even 
permutation of I 
br" =4-1 if (KL) is an odd [42] 


permutation of I 


0 otherwise 


Use of the Leibnitz rule yields 


J (da), «1 = f 2 oe Bs 


E oy ) 
2n 


= / ox 9 OF , 4 [43] 
VN X 
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The first term corresponds to a surface integration 
and we can neglect it. We then have A p! = BF from 
the antisymmetry of 5, so that 


(ad B) = -*1-—(o,—Div8) [44] 


Sa 
K oxi 


The Maxwell Equations 


The Maxwell equations become remarkably concise 
when expressed in terms of differential forms, namely 

dF>0,  d'F--j [45] 
where F is the field strength and j is the current 
density. We wish to demonstrate this. We use a 
(3 + 1)-separation of the exterior derivative into a 
timelike and a spacelike part: 


0 
d — d -- dt^ ET [46] 
We then get 
dF = (d£ +5) adt dB = 0 [47] 


By comparing coefficients, we arrive at 


OB 


We i dB = 0 [48] 
In vector notation 
curl E = — = div B = 0 [49] 
Ot 
the usual form of the homogeneous Maxwell 


equations. 
By direct application of the formula [27], one finds 


*F = —xB^dt + xE [50] 


where * means the Hodge dual in three space 
dimensions. One finds 


d«F—dx£— (as B- =) nde [51] 


Therefore, 


d x F — —(div E)dx ^ dy ^ dz 


十 (cur By — ^) dy ^ dz ^ dt 
JE» 
十 (cu By 一 E dz ^ dx ^ dt 


We apply again the Hodge dual: 


+d * F = —(div E)dt + (cu B)* — n E 
- 
十 (cu By — r3 dy 
, OB 
十 (cur B) => A] dz [53] 


In Minkowski space the expression *d* equals the 
codifferential. Therefore, the equation d'F— «d 
F— —j holds, with j given by j“=(p,J), which is 
equivalent to 


div E. =-p; curl B — A =] [54] 


the inhomogeneous Maxwell equations. 


Current Conservation 


The electromagnetic 4-current is 


(p.J) [55] 


where p is the charge density and J the current 
density. This corresponds to a 1-form 


j = pdt — J*dx — "dy — JF dz [56] 


The Hodge dual is +j — c? — j* A dt, with the 3-form 
a? — pdx ^ dy ^ dz, and the 2-form 


j" = pou" = (poy, poy) = 


È = —J*dy Adz — l'dz^dx — J*dx Ady [57] 


From the Maxwell equation d*F = —j, it follows 
that 


(d*)* F 2—d'j =0 [58] 
that is 
«d(*j) = *d(o? — P Adt) = «(da? — df? ^ dt) 
= * (Fe + div] )dt dx Ady Ade 


- +divJ — 0 [59] 


This is the *continuity equation." 
The total charge inside a volume V is Q= Jf, pdV, 


therefore 

dO d i 

og Jnd | J-nds (60) 
where OV is the surface which encloses the 


volume V, dS is the surface element, and 7 is the normal 
vector to this surface. This is current conservation. 
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The Gauge Potential 


The “Poincaré lemma" tells us that dF — 0 implies 
F — dA, with the 4-potential A: 


A — ódt -- A [61] 


and the vector potential A=A,dx + Aydy + A;dz. 
From 


F=EAdt+B= (aran sya 
= dó dt + dA + dtn [62] 


it follows by comparing coefficients that 


, OA 


In vector notation this is 


B — dA [63] 


E = grado — = B=curlA [64] 


The 4-potential is determined up to a gauge function A: 
A’=A+dA [65] 


This gauge freedom has no influence on the 
observable quantities E and B: 


F = dA'  dA- dA = dA =F [66] 


The Laplace operator is A=(d* 4- d)* — dd* + 
d'd, so when the 4-potential A fulfills the condition 
d*A —0, we have 


AA =d'dA=d*F=-j [67] 


the "classical wave equation." The condition 
d*A=0 is called the “Lorentz gauge condition." 
This condition can always be fulfilled by using the 
gauge freedom: d*(A+dA)=0 is fulfilled when 
d*d\ = AA= —d'A, where we have used the fact 
that d*A — 0 for functions. That is to say, d*A — 0 is 
fulfilled when A is a solution of the inhomogeneous 
wave equation. 


Gauge Invariance 


In quantum mechanics, the electron is described by a 
wave function which is determined up to a free 
phase. Indeed, at every point in space this phase can 
be chosen arbitrarily: 


w(x) — W(x) = exp(io(x) (x) 
B(x) +! (x) = U(x) exp{—ia(x)} 


with the only condition being that a(x) is a 
continuous function. The gauge transformation is 


[68] 


of the form g — exp {ia(x)}, with g an element of the 
abelian gauge group G — U(1). The free action is 


So =f Lo d*x [69] 
with 
Lo = W(i7"d, — m)w [70] 


the *Lagrange density." This action is not invariant 
under gauge transformations: 


£o Lg = Yli, — m)v — (8,o)vy"v — [71] 


The undesired term can be compensated by the 
introduction of a gauge potential w in a covariant 
derivative of v, 


Dv = (d + w)y [72] 


which has the desired transformation property 
Di» — exp [ia] Di when besides the transformation 
w(x) — exp {ia(x)}y(x) of the matter field the gauge 
potential simultaneously transforms according to the 
gauge transformation w -— w — ida. The new Lagrange 


density is 


L= pli D, — m)v = Lo + iw) (x)y"v(x) [73] 


The substitution ô, 一 D, is known to physicists; 
with w= — igA it is the ansatz of minimal coupling 
for taking into account electromagnetic effects: 
ð, — 0, — iqA,. The Lagrange density becomes in 
this notation £ — £o —'A,J", where J" = —qu»y^wy. 

The Lagrange density must now be completed by 
a kinetic term for the gauge potential and we get the 
complete electromagnetic Lagrange density 


L= £o — ÀJ" — tPF [74] 


with F,,—0,A, —0,A,. In the action this corre- 
sponds to 


S = Sp — | A,J" vol =z J F,,F"vol^ [75] 
Jm 4 Jm 


We get the field equations for the potential A by 
demanding that the variation of the action vanishes: 


| 1 
6S[A] = — / 6A, J" vol 一 二 6 f F,,F""vol* [76] 
M 4 Jm 
We write now 


f 54 Jpnvor = (6A, j) [77] 
M 
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and 
] _ 4 
hd uv 
46 | Fuk vol 
=J RA xP AF F) 
a” ba 2v: 


= (6dA,F) = (déA,F) = (6A,d'F) [78] 


where we have exchanged the action of ó and d. 
Since this holds for arbitrary variations 6A we find 


ee = =j (79) 


the inhomogeneous Maxwell equation. 


Nonabelian Gauge Theories 


In SU(N) gauge theory the elementary particles are 
taken to be members of symmetry multiplets. For 
example, in electroweak theory the left-handed 
electron and the neutrino are members of an SU(2) 


doublet: 


g(x) = exp {A(x)} [82] 


where g(x) is an element of the Lie group SU(2) and 
A is an element of the Lie algebra su(2). The Lie 
algebra is a vector space, and its elements may be 
expanded in terms of a basis: 


Als) =A Ia [83] 
For su(2) the basis elements are traceless and anti- 


Hermitian (see below), they are conventionally 
expressed in terms of the Pauli matrices, 


with 


»-(' J [85] 
0 -1 


They are conventionally normalized according to 


te(T; Tj) —— $6, [86] 


The Dirac Lagrangian is not invariant with 
respect to local gauge transformations: 


Lo = (i8, — mY > Lh 


= Lo + ivy" (güg )v [87] 
We introduce the gauge potential 
uy (x) = uf (x) Ta (88 


with a gauge transformation 
Wy >W, = g wg Lg Ong [89] 


The Lagrange density is modified through a covar- 
iant derivative: 


0, — D, = 0, wj [90] 
The covariant derivative D,, transforms according to 
D,—D,' = D,g [21] 

and thus the modified Lagrange density 
L = vi" D, — m)v = £o yw, [92] 


is invariant with respect to local gauge transformations. 
The extra term in the Langrange density is 
conventionally written 


E [93] 

with 
A, ——iqu, [94] 

and 
Jt = y T 95} 


In mathematical terminology w is called a connec- 
tion. The quantity A is the physicists gauge 
potential. The connection is anti-Hermitian and the 
gauge potential Hermitian. The gauge potential also 
includes the coupling constant g. We will refer to 
both w and A as the gauge potential, where the 
relation between them is given by eqn [94]. 

We can write the gauge potential as A= Af dx" Ta 
or, in the SU(2) case, as 


Ag = ALT; + AGT: + ALT; [96] 


where we see explicitly that it involves three vector 
fields, which couple to the electroweak currents [95] 
with the single coupling constant q, and which will 
become after symmetry breaking the three vector 
bosons W,, W_, Zo of the electroweak gauge theory. 
Actually, a mix of the neutral gauge boson and the 
photon will combine to yield the Zo boson, while the 
orthogonal mixture gives rise to the electromagnetic 
interaction, in an SU(2) x U(1) theory. At this stage, 
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the gauge bosons are all massless, their masses are 
generated by the *Higgs' mechanism." 


Lie-Algebra-Valued p-Forms 


To describe nonabelian fields, we need Lie-algebra- 
valued p-forms: 


g= Lu [97] 


where T; is a generator of the Lie algebra, the index 
a runs over the number of generators of the Lie 
algebra, and the $^ are the usual scalar-valued 
p-forms. The composition in a Lie algebra is a Lie 
bracket, which is defined for two Lie-algebra-valued 
p-forms by 


[ó, V] :— [Ta, Tolo" ^ v^ [98] 
The Lie bracket in the algebra is 
[Ta To] = f$. 99] 


where fi- are the structure constants. It follows from 
this that 


h,o] = [T4, T] ^ d? = —[Ty, Tal Ad? [100] 


or 


[v 6] = (1) [9, v] [101] 


when ¢ is a p-form and y is a g-form. In the special 
case that T, is a matrix, also the product T;T, is 
defined, and from this the product of two Lie- 
algebra-valued p-forms 


AW = Td A Ty? = T, T, d^ NYP [102] 
Now the Lie bracket is a commutator: 
[Tas T5] = T4T, — feta [103] 
and 
[ó, v] = [Ta, Ty] à" ^ y^ 


= Tag” A Ty — (—1) T,u^ A T, d^ 
= ó^wv-— (—1)* ^o [104] 


From this relation it follows that for ó and % odd 
p-torms 


[ó, v] 2 6^ +g [105] 
For ó an odd p-form 
[6,0] — ó^ó--ó^ó —2(Ó6^ó) [106] 


The Gauge Potential and the 
Field Strength 


The generalization of the abelian relationship 
between the gauge potential and the field strength, 
F — dA, is 


0 = dw 3 w, w] = d9 --w ^w [107] 


where because w is a 1-form we can use eqn [106]. 
The mathematician refers to Ó as the curvature. The 
physicist writes, in analogy to eqn [94], 


F = —iqü = 1P^ dx! ^ dx"T, [108] 


uv 
One obtains for the components 


F^, = 0,A% — 0,A* — iqff A? AS [109] 


m nv 


A generalization of the gauge transformation of 
A, that is, A' — A + dA, is eqn [89]: 


W = g wgtg dg [110] 


A quantity ó with the transformation property 


ó =g ‘dg [111] 


is called a “tensorial” quantity. The gauge potential 
w is according to this definition nontensorial. 
Nevertheless the field strength is tensorial. Indeed 


= dg 'wg) + (dg !) Adg 
+3 [g-'we +g 'dg, g 'wg +g 'dg] 
= (dg!) Awg + g ‘dwg — g 'wAdg+ (dg ')^dg 
+3g '[w,wlg +5 [ge wg g dg 
*tilg'dg.g 'wg]--$[g "dg, g dg] 
=g 0g-- (dg !) ^ug—g 'w^dg- (dg !) ^dg 
-glwA^Adg-Mg dg^g lwg-Mg dg^g dg 
= g'g [112] 
where we have used the derivation of the relation 
g !g-— ld to get 
dg'-—-—g'dgg' [113] 


In the abelian case, we had dF=0. The non- 
abelian analog is 


dð = dw ^w —w^dw 
= (0 — wo ^w)^w-—w^(0-—wM^w) 
= 0^w-—wA^0 [114] 


Or 


dó -- 5^0 —6^w — 0 [115] 
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the Bianchi identity. It can also be written as 


dð -- 0^0 —0^wc = dé + [5,0] = 0 [116] 
because from eqn [104] 
w+ (-1)*!8^w = [w, 6] [117] 
The covariant derivative D is defined as 
Dé:= do + [w, d) [118] 


for o a tensorial quantity. The covariant derivative 
takes tensorial p-forms into tensorial (p + 1)-forms: 
/ = ON = = E 
D'¢' —d(g 'óg)-ie wg--g dgg pel 
=dg ^óg-g dóg- (-1)'g pAdg 
+ [g wg,g og] + [g des ós) 


—-gDóg-dg '^óg-- (-1)'g ! ó^dg 
-g'dgg '^A^og-—(-1)'g ! o ^dg 
= g ' Dog [119] 


We have thereby verified the transformation prop- 
erty of eqn [91]. 


The Gauge Group 


From the gauge transformation W% = gy the require- 
ment |w| = |v|^ leads to gtg= 1. That means that g 
belongs to the unitary Lie group G = U(), whose 
elements fulfill gi =g" =g. For elements of the Lie 


algebra G — u(z) this implies 

(eX)'— e = eX [120] 
Or 

xt -X'--X [121] 


where X is complex conjugation and X! means 
transposition. 

For elements of the Lie algebra we can define a 
scalar product (the Killing metric) 


(X, Y) :=—tr (XY) =—X%,X°*,, [122] 
The scalar product is real: 
(X,Y) = — XY = -—X" Xa =(X, Y} [123] 
symmetric: 
(X, Y) = —tr(X, Y)= —tr(Y,X)=(Y,X) [124] 
and positive definite: 
(X, X)= —X% XP, = XX g = |X|” [125] 


The scalar product is invariant under the action of 
G on G: for gE G 


(gXg ',gYg ') = —tr(gXYg ) 
= —tr(X,Y)=(X,Y) [126] 
or for X. Y.Z €G 
lenye ef^ 76-7 (y, Z) [127] 


We take the derivative of this equation with respect 
to t at the value t=0 and get: 


([X, Y], Z) + (Y, [X,Z]) =0 [128] 


We define an action of the algebra G on itself: 


ad(X):G — G 


ad(X)Y = [X, Y] [129] 


We can then formulate our conclusion as follows: 
the action of G on itself is anti-Hermitian: 


(ad(X)Y,Z) = — (Y,ad(X)Z) [130] 


or 


lad(X)|' =—ad(X) [131] 


From g!g = 1 we have |det (g)|^ = 1. For the gauge 
group G — SU(N) we require in addition det (g) — 1. 
Since 


det(g) = det(exp( X)) = exp(tr(X)) [132] 


the elements X € su(N) must be traceless. A basis of 
the vector space of traceless, anti-Hermitian (2 x 2) 
matrices is given by the Pauli matrices, eqn [85]. 


The Yang-Mills Action 


The SU(2) Yang-Mills action is, in analogy to the 
abelian case, 


1 ee d 4 
= ap vo] " jiv | 
S E iz]. PBF YO "a Ja tr(F,,F"")vo 


1 
-z:]. tr(F ^ * F) 


We have included the trace in our definition of the 
scalar product: 


[133] 


(6,u):— -/ tr <u! > vol" = -[ tr(dA*w) [134] 
M M 


We then write eqn [133] as 


Slw] = 1 (0,0) [135] 


taking into account the relation between 0 and the 
field strength F, and indicating the dependence on 
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the gauge potential. Since 0 is tensorial the action is 
invariant. 

Now we calculate the variation von S[w] with 
respect to a variation of the gauge potential: 


d 1. 


6S] = — SWO] = 380,0) 
= ; (69. 0) + (0,60)) 


| 


(50,0) = (s (au + ; bu 4) i ) 


bw, w| 十 - [w, dw], ) 


= (déw + |w, dw], 8) [136] 


where we have exchanged the order of ó and d. We 
remark that although w is not a tensorial section, dw is: 
for w} =g wig--g dg and w,=g lwog + g dg is 

bw = wi, — w = g (wi —wa)g [137] 


The quantity 0 is in any case tensorial. Therefore, 
the covariant derivative is defined, and we have 


Déw = déw + |w, ôw] [138] 
and 
Dé = dé + [w, 0] [139] 


In general, the action of the covariant derivative on 
tensorial quantities can be written as D=d + ad(w), 
where ad(X) is the representation of the Lie algebra on 
itself introduced in the previous section. We now have 


§S|w] = (Déw, 0) = (Sw, D*0) = 0 [140] 


for an arbitrary variation ôw. Therefore, D*0 — 0. 
We have obtained 


D'0 —0 [141] 
the *Yang-Mills equations," and 
D0 — 0 [142] 


the “Bianchi identites.” These are the generalizations 
of the Maxwell equations d*F =0 and dF=0 in the 
absence of external sources. For the general case of 
interacting fermions, we write out the full action, in 
analogy to eqn [74], and obtain, in analogy to eqns 
[79] and [58], 


D'óé--],  D'J-0 [143] 


We shall now derive, again for the pure gauge 
sector, coordinate expressions for the Yang-Mills 
equations. Consider the expression 


6S|w] = (Dw, 0) = (dw, D*0) 


= (déw + |w, dw], 8) [144] 


The first term in the last expression is 
(dsw, 0) = (Sw, d*8) = —tr j ôw, {d0} volt [145] 
M 


The second term can be computed using 


w, buy], = {w ^ dw + bw ^w) (94, Ov) 


= wyÓwy — wub + dwWyWy — Iy [146] 
and hence 
lu, Bu], 9" = 2 levy, bw ] 00” [147] 
because 0 is antisymmetric, 0" = —0””. Thus, 
([w, w], 0) = -| wl dw] ^ * 0) 
= -5f tr([w, fw] ,, 9") vol" 
= — 'i tr([w,,, dw, ]0” vol" 
JM 
= 大 (uj, dw], 9" vol [148] 


where (,) is the scalar product in G. From eqn [128] 
this equals 


= f (fwv, Wu 9" vol" 
M 


T | tr(&, lwn, 0" ])vol" [149] 
JM 


Combining this with eqn [144] gives 


(6w, D*6) = — 人 tr(&o, ((d* 0)" — [wu 0""] vol 


= (ôw, {(d*0)" — [wu ]}) [150] 


We can now insert the coordinate expression for 
(d6)” = —0,0" [151] 


Finally, the coordinate expressions of the Yang- 
Mills equations D*0 — 0 are 


(D*O) = —(8,9" + um} -0 — [152] 


The Analogy with Electromagnetism 


The Yang-Mills equation and the Bianchi identity in 
the absence of external sources are 


0, FH” — ig[A,, F""] = 0 [153] 
and 
OF, T OF, Ex Ola = iqt|A,. E 


 [A;, F] + [An Fral} — 0 [154] 
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We shall write these equations in terms of the fields 


FÜ—E, ij—1,2,3 [155] 


p? = B’. F?! = B?, F? - B3 [156] 


where the E and B vectors may be thought of as 
“electric” and “magnetic” fields, even though they have 
Lie-algebra indices, F? = (F4) T}, etc. In the context of 
the SU(3) theory, they are referred to as the *chromo- 
electric" and *chromomagnetic" fields, respectively. 
The Yang-Mills equations with y= 0 are 
0,F® — iq[A;, F^] = 0 


~ 


[157] 


with i=1,2,3 a spatial index. In vector notation 
this is 


div E = ig(A- E — E. A) [158] 


This is the analog of Gauss's equation. Even though 
we started out without external sources, iq(A- E — 
E-A) plays the role of a “charge density." The 
Yang-Mills field E and the potential A combine to 
act as a source for the Yang-Mills field. This is an 
essential feature of nonabelian gauge theories in 
which they differ from the abelian case, due to the 
fact that the commutator [A, E] is nonvanishing. 

Now consider the Yang-Mills equations with a 
spatial index 4 — i: 


aF? + OOF’ — iq[Ag, FP] — ig[A;F] =0 — [159] 
In vector notation this is 
curl B -7 — iq( AE — EA) 
+iq(A x B+ B x A) [160] 


replacing the Ampere—Maxwell law. Note that there 
are two extra contributions to the “current” other 
than the displacement current. 

The analogs of the laws of Faraday and of the 
absence of magnetic monopoles are derived similarly 
from the Bianchi identities. The results are 


curl p2 = iq((A x E + E x A) + (AoB — BAo)} 


本 161] 


and f 
div B = iq(A- B — B- A) [162] 
Further Remarks 


The foundations of the mathematics of differential 
forms were laid down by Poincaré (1953). They 
were applied to the description of electrodynamics 


already by Cartan (1923). A modern presentation of 
differential forms and the manifolds on which they 
are defined is given in Abraham et al. (1983). A 
recent treatment of electrodynamics in this approach 
is Hehl and Obukhov (2003). Weyl’s argument is in 
his paper of 1929. 

Nonabelian gauge theories today explain the 
electromagnetic, the strong and weak nuclear 
interactions. The original paper is that of Yang 
and Mills (1954). Glashow, Salam, and Weinberg 
(1980) saw the way to apply it to the weak 
interactions by using spontaneous symmetry 
breaking to generate the masses through the use 
of the Higgs' (1964) mechanism. t'Hooft and 
Veltman (1972) showed that the resulting quan- 
tum field theory was renormalizable. The strong 
interactions were recognized as the nonabelian 
gauge theory with gauge group SU(3) by Gell- 
Mann (1972). For a modern treatment which puts 
nonabelian gauge theories in the context of 
differential geometry, see Frankel (1987). 


See also: Dirac Fields in Gravitation and Nonabelian 
Gauge Theory; Electroweak Theory; Measure on Loop 
Spaces; Nonperturbative and Topological Aspects of 
Gauge Theory; Quantum Electrodynamics and its 
Precision Tests. 
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Introduction 


For the purpose of this article, vortices are topological 
solitons arising in field theories in (2 十 1)-dimensional 
spacetime when a complex-valued field ¢ is allowed to 
acquire winding at infinity, meaning that the phase of 
P(t, x), as x traverses a large circle in the spatial plane, 
changes by 27m, where n is a nonzero integer. Such 
winding cannot be removed by any continuous 
deformation of @ (hence “topological”) and traps a 
considerable amount of energy which tends to coalesce 
into smooth, stable lumps with highly particle-like 
characteristics (hence “solitons”). Clearly, the universe 
is (3-- 1) dimensional. Nonetheless, planar field 
theories are of physical interest for two main reasons. 
First, the theory may arise by dimensional reduction of 
a (3 + 1)-dimensional model under the assumption of 
translation invariance in one direction. Vortices are 
then transverse slices through straight tube-like objects 
variously interpreted as magnetic flux tubes in a 
superconductor or cosmic strings. Second, a crucial 
ingredient of the standard model of particle physics is 
spontaneous breaking of gauge symmetry by a Higgs 
field. As well as endowing the fundamental gauge 
bosons and chiral fermions with mass, this mechanism 
can potentially generate various types of topological 
solitons (monopoles, strings, and domain walls) whose 
structure and interactions one would like to under- 
stand. Vortices in (2 十 1) dimensions are interesting in 
this regard because they arise in the simplest field 
theory exhibiting the Higgs mechanism, the abelian 
Higgs model (AHM). They are thus a useful theoret- 
ical laboratory in which to test ideas which may 
ultimately find application in more realistic theories. 
This article describes the properties of abelian Higgs 
vortices and explains how, using a mixture of 
numerical and analytical techniques, a good under- 
standing of their dynamical interactions has been 
obtained. 


The Abelian Higgs Model 


Throughout this article spacetime will be R?*' 
endowed with the Minkowski metric with signature 
(+,—,—), and Cartesian coordinates x^",u— 
0,1,2, with x°=t (the speed of light c— 1). A 
spacetime point will be denoted x, its spatial part by 
x — (x!, x?). Latin indices j, b, ... range over 1, 2, and 
repeated indices (Latin or Greek) are summed over. 
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We sometimes use polar coordinates in the spatial 
plane, x —r(cos6, sin 0), and sometimes a complex 
coordinate z—x! -- ix? — re", Occasionally, it is 
convenient to think of R?*! as a subspace of R?*! 
and denote by k the unit vector in the (fictitious) 
third spatial direction. The complex scalar Higgs 
field is denoted ¢, and the electromagnetic gauge 
potential A,,, best thought of as the components of a 
1-form A—A, dx". F,, —0,A, — 0,A, is the field 
strength tensor which, in R7*', has only three 
independent components, identified with the mag- 
netic field B=F,> and electric field (Ej, E;) — 
(Foi, Fo2). The gauge-covariant derivative is Do = 
ð $ — ieA Q, e being the electric charge of the Higgs. 
Under a U(1) gauge transformation, 


p= eto, 
A:R^*! —> R being any smooth function, F,, and 
|ló| remain invariant, while D, e^D,$. Only 
gauge-invariant quantities are physically observable 
(classically). 


With these conventions, the AHM has Lagrangian 
density 


A, Ay +e 10A [1] 


£== FE +5D,¢D¥6 — a -JePy [2] 
which is manifestly gauge invariant. By rescaling 
p, A,x and the unit of action, we can (and 
henceforth will) assume that e—r-o-1. The 
only parameter which cannot be scaled away is 入 > 0. 
Its value greatly influences the model's behavior. 

The field equations, obtained by demanding that 
ó(x),A,(x) be a local extremal of the action 
S= [Cd x, are 


入 
D,D"d +5 (1— |o[^)ó = 0 


i3] 
OF +5 (6D,6 — Dr) = 0 
This is a coupled set of nonlinear second-order PDEs. 
Of particular interest are solutions which have finite 
total energy. Energy is not a _ Lorentz-invariant 
quantity. To define it we must choose an inertial 
frame and, having broken Lorentz invariance, it is 
convenient to work in a temporal gauge, for which 
Ay = 0 (which may be obtained by a gauge transfor- 
mation with A(t, x) = fo Ao(t’, x) dt’, after which only 
time-independent gauge transformations are per- 
mitted). The potential energy of a field is then 


E= - J G + DDB 2 (1 一 P erae 


= E mag ki Ead T Eset [4] 
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while its kinetic energy is 
- 1 2 pun Us O) 
Ekin = 5 (az + 2m) dx dx [5] 


If à, A satisfy the field equations then the total 
energy FE; —Ej--E is independent of t. By 
Derrick’s theorem, static solutions have Emag = 
Ef (Manton and Sutcliffe 2004, pp. 82-87). 

Configurations with finite energy have quantized 
total magnetic flux. To see this, note that E finite 
implies |ó| — 1 as r — oo, so ġ ~ e'X'^9 at large r for 
some real (in general, multivalued) function xy. The 
winding number of à is its winding around a circle of 
large radius R, that is, the integer n=(x(R, 27) 一 
x(R, 0))/2z. Although the phase of ¢ is clearly gauge 
dependent, n is not, because to change this, a gauge 
transformation e^:R^ — U(1) would itself need 
nonzero winding around the circle, contradicting 
smoothness of e^. The model is invariant under 
spatial reflexions, under which n> —n, so we will 
assume (unless noted otherwise) that > 0. Finite- 
ness of E also implies that Dó = dó — iAd — 0, so 
A ~ —idó/ó ~ dx asr — oo (note ¢ Æ 0 for large r). 
Hence, the total magnetic flux is 


2T 
| Bd'x— lim $ A= lim Ogxd0 — 2zn [6] 
R2 R-00 JSx Roo Jo 

where Sp={x:|x|=R} and we have used Stokes's 
theorem. The above argument uses only generic 
properties of E, namely that finite Est requires |¢| 
to assume a nonzero constant value as 7 — oo. So 
flux quantization is a robust feature of this type of 
model. As presented, the argument is somewhat 
formal, but it can be made mathematically rigorous 
at the cost of gauge-fixing technicalities (Manton 
and Sutcliffe 2004, pp. 164—166). Note that if n Z 0 
then, by continuity, ó(x) must vanish at some x € 
R*, and one expects a lump of energy density to be 
associated with each such x since 6=0 maximizes 
the integrand of Eef- 


Radially Symmetric Vortices 


The model supports static solutions within the 
radially symmetric ansatz ó-o(r)e"^, A — a(r) dd, 
which reduces the field equations to a coupled pair 
of nonlinear ODEs: 


dc ido 1 > À 

d? ag ayo +5 (1—o")o = 0 
d'a 1da 
Brot =0 


Finite energy requires lim,_., o(r) = 1, lim, >œ a(r) = 
while smoothness requires o(r) ~ constr’, a(r) ~ 


const??? as r — 0. It is known that solutions to this 
system, which we shall call z-vortices, exist for all 
n,A, though no explicit formulas for them are 
known. They may be found numerically, and are 
depicted in Figure 1. Note that o and a always rise 
monotonically to their vacuum values, and B always 
falls monotonically to 0, as r increases. These 
solutions have their magnetic flux concentrated in a 
single, symmetric lump, a flux tube in the R?*! 
picture. In contrast, the total energy density (inte- 
grand of E in [4]) is nonmonotonic for n > 2, being 
peaked on a ring whose radius grows with n. This is 
a common feature of planar solitons. 

The large r asymptotics of z-vortices are well 
understood. For A € 4 one may linearize [7] about 
c —1,a-—n, yielding 


o(r)~ 14 2^ Ko(VAr) i8] 
a(r) ~ n+ S rKir) [9] 


where 4,,71, are unknown constants and Ką 
denotes the modified Bessel's function. For 入 > 4 
linearization is no longer well justified, and the 
asymptotic behaviour of o (though not a) is quite 
different (Manton and Sutcliffe 2004, pp. 174-175). 
We shall not consider this rather extreme regime 
further. Note that 


K(r) ~ ner as 7 — oo [10] 


for all a, so both o and a approach their vacuum 
values exponentially fast, but with different decay 
lengths: 1/V/A for ø, 1 for a. This can be seen in 
Figure 1a. The constants qn and my depend on A and 
must be inferred by comparing the numerical 
solutions with [8], [9]; q—44 and m=m, will 
receive a physical interpretation shortly. 

The 1-vortex (henceforth just *vortex") is stable for 
all A, but z-vortices with n > 2 are unstable to break 
up into n separate vortices if \ > 1. We shall say that 
the AHM is type I if A <1, type II if A » 1, and 
critically coupled if 和 = 1, based on this distinction. Let 
E, denote the energy of an z-vortex. Figure 2 shows 
the energy per vortex E,,/n plotted against n for 
A—0.5,1, and 2. It decreases with n for A— 0.5, 
indicating that it is energetically favorable for isolated 
vortices to coalesce into higher winding lumps. For 
A —2, by contrast, E,/n increases with n indicating 
that it is energetically favorable for z-vortices to fission 
into their constituent vortex parts. The case \=1 
balances between these behaviors: E, /nis independent 
of n. In fact, the energy of a collection of vortices is 
independent of their positions in this case. 


Energy density 
S 


(c) 
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(b) 


Figure 1 Static, radially symmetric n-vortices: (a) the 1-vortex profile functions o(r) (solid curve) and a(r) (dashed curve) for \ = 2, 1, 
and 1/2, left to right; (b) the magnetic field B; and (c) the energy density of n-vortices, n= 1 to 5, left to right, for A — 1. 


Ens 


Figure 2 The energy per unit winding En/n of radially 
symmetric n-vortices for à= 1/2, 1, and 2. 


Interaction Energy 


A precise understanding of the type I/II dichotomy 
can be obtained using the 2-vortex interaction 
energy Eint(s) introduced by Jacobs and Rebbi. This 
is defined to be the minimum of E over all n=2 
configurations for which ¢(x)=0 at some pair of 
points x1,x2? distance s apart. One interprets x1, x2 
as the vortex positions. Eint can only depend on their 
separation s = |x; — x2|, by translation and rotation 
invariance. Figure 3 presents graphs of Eint(s) 
generated by a lattice minimization algorithm. For 
A < 1, vortices uniformly attract one another, so a 
vortex pair has least energy when coincident. For 
A1, vortices uniformly repel, always lowering 
their energy by moving further apart. The graph for 
\=1 would be a horizontal line, E; (s) = 27. 
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(a) 
Figure 3 The 2-vortex interaction energy Eint(s) as a function 
form EjX(s) (dashed curve) for (a) à= 1/2 and (b) ^ —2. 


The large s behavior of Eint(s) is known, and can 
be understood in two ways (Manton and Sutcliffe 
2004, pp. 177-181). Speight, adapting ideas of 
Manton on asymptotic monopole interactions, 
observed that, in the real ọ gauge ($-e-"ó, 
A — A — d0), the difference between the vortex and 
the vacuum ó — 1, A — 0 at large r, 


b= 6-1 3- Ki(VAr) [11] 
(Ao A) ~ 3- (0, k x VKo(r)) [12] 


is identical to the solution of a linear Klein- 
Gordon-Proca theory, 


(8,0^--A)y =, — (8,0^--1)A, 2j, — [13] 


in the presence of a composite point source, 


K=q6(x), (joj)— m(0,kx Vó(x) [14] 
located at the vortex position. Viewed from afar, 
therefore, a vortex looks like a point particle 
carrying both a scalar monopole charge 4 and a 
magnetic dipole moment m, a “point vortex,” 
inducing a real scalar field of mass VA (the Higgs 
particle) and a vector boson field of mass 1 (the 
*photon"). If physics is to be model independent, 
therefore, the interaction energy of a pair of well- 
separated vortices should approach that of the 
corresponding pair of point vortices as the separa- 
tion grows. Computing the latter is an easy exercise 
in classical linear field theory, yielding 


2.42 


2.38 


Ein 


2.34 


2.3 
0 


(b) 


of vortex separation (solid curve), in comparison with its asymptotic 


) 


Eint(s) ~ Eg (s) =2E1 — $- Ko( VAs) 


2 
5 Ko(s) [15] 


十 
Bettencourt and Rivers obtained the same formula 
by a more direct superposition ansatz approach, 
though they did not give the constants q, m a 
physical interpretation. 

The force between a well-separated vortex pair, 
—E; (s) consists of the mutual attraction of 
identical scalar monopoles, of range 1/ V/A, and the 
mutual repulsion of identical magnetic dipoles, of 
range 1. If 入 < 1, scalar attraction dominates at 
large s so vortices attract. If 和 > 1, magnetic 
repulsion dominates and they repel. If A—1 then 
q =m, as we shall see, so the forces cancel exactly. 
Figure 3 shows both Ein and Ez; for 4 —0.5,2. The 
agreement is good for s large, but breaks down for 
s«4, as one expects. Vortices are not point 
particles, as in the linear model, and when they lie 
close together the overlap of their cores produces 
significant effects. 

The same method predicts the interaction energy 
between an ni-vortex and an 75-vortex at large 
separation. We just replace 2E; by E,, + Em, q? by 
dud», and m^ by m,,m,,. In particular, an 
antivortex ((—1)-vortex) has E4—E1,q4.1—41- 4, 
and mı = —m, = —m, so the interaction energy for 
a vortex-antivortex pair is 


2 2 


E? (s ~ 2E, — 3 Ko( VAr) Š 5 一 Ko() [16] 


int 
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which is uniformly attractive. It would be pleasing if 
q,,"1, could be deduced easily from q, m. One 
might guess q,-|n|q,m,-nm, in analogy with 
monopoles. Unfortunately, this is false: gn, Mn 
grow approximately exponentially with |n]. 


Vortex Scattering 


The AHM being Lorentz invariant, one can obtain 
time-dependent solutions wherein a single z-vortex 
travels at constant velocity, with speed 0 «v « 1 
and Ero =(1 — 2) ^ E,, by Lorentz boosting the 
static solutions described above. Of more dynamical 
interest are solutions in which two or more vortices 
undergo relative motion. The simplest problem is 
vortex scattering. Two vortices, initially well sepa- 
rated, are propelled towards one another. In the 
center-of-mass (COM) frame they have, as t — —oo, 
equal speed v, and approach one another along 
parallel lines distance b (the impact parameter) 
apart, see Figure 4. If b=0, they approach head- 
on. Assuming they do not capture one another, they 
interact and, as t — oo, recede along parallel straight 
lines having been deflected through an angle O (the 
scattering angle). If scattering is elastic, the exit lines 
also lie b apart and each vortex travels at speed v as 
t 一 oo. The dependence of O on v, b, and A has 
been studied through lattice simulations by several 
authors, perhaps most comprehensively by Myers, 
Rebbi, and Strilka (1992). We shall now describe 
their results. 

Note first that vortex scattering is actually 
inelastic: vortices recede with speed <v because 
some of their initial kinetic energy is dispersed by 
the collision as small-amplitude traveling waves 
(“radiation”). This energy loss can be as high as 
80% in very fast collisions at small b. At small v the 
energy loss is tiny, but can still have important 
consequences for type I vortices: if v is very small, 
they start with only just enough energy to escape 
their mutual attraction. In undergoing a small b 
collision they can lose enough of this energy to 
become trapped in an oscillating bound state. In this 
case they do not truly scatter and O is ill-defined. 
Myers et al. find that v > 0.2 suffices to avoid 


Figure 4 The geometry of vortex scattering. 


capture when À= 1/2. Since type I vortices attract, 
one might expect O to be always negative, indicating 
that the vortices deflect towards one another. In 
fact, as Figure 5a shows, this happens only for small 
v and large b. Another naive expectation is that 
O—0 or 609—180? when b=0 (either vortices pass 
through one another or ricochet backwards in a 
head-on collision). In fact © =90°, the only other 
possibility allowed by reflexion symmetry of the 
initial data. Figure 6 depicts snapshots of such a 
scattering process at modest v. The vortices deform 
each other as they get close until, at the moment of 
coincidence, they are close to the static 2-vortex 
ring. They then break apart along a line perpendi- 
cular to their line of approach. One may consider 
them to have exchanged half-vortices, so that each 
emergent vortex is a mixture of the incoming 
vortices. This rather surprising phenomenon was 
actually predicted by Ruback in advance of any 
numerical simulations and turns out to be a generic 
feature of planar topological solitons. 

Consider now the type II case (4 — 2, Figure 5b). 
Here, O > 0 for all v, b as one expects of particles 
that repel each other. Head-on scattering is more 
interesting now since two regimes emerge: for v » 
Zurit & 0.3, one has the surprising 90° scattering 
already described, while for v < zerit the vortices 
bounce backwards, O-—180?. This is easily 
explained. In order to undergo 90° head-on scatter- 
ing, the vortices must become coincident (otherwise 
reflexion symmetry is violated), hence must have 
initial energy at least E». For v < verit, where 


2E4 
V L= Vai 


they have too little energy, so come to a halt before 
coincidence, then recede from one another. The 
solution verit of [17] depends on A and is plotted in 
Figure 7. For v slightly above Vert, we see that, in 
contrast to the type I case, O(b) is not monotonic: 
maximum deflection occurs at nonzero b. 

The point vortex formalism yields a simple model 
of type II vortex scattering which is remarkably 
successful at small v. One writes down the Lagrangian 
for two identical (nonrelativistic) point particles of 
mass E, moving along trajectories x(t), x2(t) under 


the influence of the repulsive potential E, 


L-1E(&P-|P)-ES(x — x2!) [18] 


= E [17] 


Energy and angular momentum conservation reduce 
O(v, b) to an integral over one variable (s = |x; — x) 
which is easily computed numerically. To illustrate, 
Figure 5b shows the result for A—2,v—0.1 
in comparison with the lattice simulations of 
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Figure 5 The 2-vortex scattering angle © as a function of impact parameter b for v=0.1 (Vv),v=0.2 (A), 


v=0.3 (2), v 20.4 (x), v=0.5 (x), and v=0.9 (+), as computed by Myers et al. (1992): (a) A=1/2; (b) A=2; (c) A=1. The 
dotted curves are merely guides to the eye. The solid curves in (b), (c) were computed using the point vortex model. Note that Myers 


et al. use different normalizations, so b = /2bygs and 入 = Ayns/2. 


Myers et al. The agreement is almost perfect. For 
large v the approximation breaks down not only 
because relativistic corrections become significant, 
but also because small b collisions then probe the small 
|x1— x2| region where vortex core overlap effects 
become important. For the same reason, the point 
vortex model is less useful for type I scattering. 
Here there is no repulsion to keep the vortices well 
separated, so its validity is restricted to the small v, 
large b regime. 

Critical coupling is theoretically the most inter- 
esting regime, where most analytic progress has been 
made. Since Ein = EX. = 0, one might expect vortex 
scattering to be trivial (O(v, b) = 0), but this is quite 
wrong, as shown in Figure Sc. In particular, 


O(v, 0) —90* for all v, just as in the large v type I 
and type II cases. The point is that scalar attraction 
and magnetic repulsion of vortices are mediated by 
fields with different Lorentz transformation proper- 
ties. While they cancel for static vortices, there is no 
reason to expect them to cancel for vortices in 
relative motion. 


Critical Coupling 


The AHM with A— 1 has many remarkable proper- 
ties, at which we have so far only hinted. These all 
stem from Bogomol’nyi’s crucial observation 
(Manton and Sutcliffe 2004, pp. 197-202) that the 
potential energy in this case can be rewritten as 
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Figure 6 Snapshots of the energy density during a head-on 
collision of vortices. This 90° scattering phenomenon is a 
generic feature of planar topological soliton dynamics. 
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À 
Figure 7 The critical velocity for 90° head-on scattering of type 
I| vortices Vei as a function of A, as predicted by equation [17] 


(solid curve), in comparison with the results of Myers et al. 
(1992), (crosses). 


(d 1 2M 
B=; f4 (2-30-10) 
HIDio-kiDiof +B} dx - i / d(óDó) [19] 
R? 


The last integral vanishes by Stokes's theorem, so 
E > an by flux quantization [6], and E — 77 if and 
only if 


(Dı 4-1D3)ó = 0 [20] 
1a - 1e) =B [21] 


Note that system [20], [21] is first order, in contrast 
to the second-order field equations [3]. No explicit 
solutions of [20], [21] are known. However, Taubes 
has proved that for each  unordered list 
[21,22,...,2,] of n points in C, not necessarily 
distinct, there exists a solution of [20], [21], unique 
up to gauge transformations, with ó(zi) — $(22) — 
--» =(z,)=0 and ¢ nonvanishing elsewhere, the 
zero at z, having the same multiplicity as z, has in 
the list. Note that the list is unordered: a solution is 
uniquely determined by the positions and multi- 
plicities of the zeroes of ¢, but the order in which we 
label these is irrelevant. The solution minimizes E 
within the class C, of winding n configurations, so is 
automatically a stable static solution of the model. 
Equation [20] applied to the symmetric -vortex, 
ó — oc(r) e"?, A — a(r) dé implies a(r) =n — ro'(r)/o(r). 
Comparing with [8], [9], it follows that g,=m, 
when A— 1 as previously claimed, since Kı = 一 天 0. 
Tong has conjectured, based on a string duality 
argument, that qı = —278!/*, This is consistent with 
current numerics but has no direct derivation so far. 


158 Abelian Higgs Vortices 


Taubes's theorem shows that this -vortex is just 
one point, corresponding to the list [0,0,...,0], in a 
2n-dimensional space of static multivortex solutions 
called the moduli space M,. This space may be 
visualized as the flat, finite-dimensional valley 
bottom in C, on which E attains its minimum 
value, 7m. Points in M, are in one-to-one correspon- 
dence with distinct unordered lists [z1,22,..., z,], 
which are themselves in one-to-one correspondence 
with points in C", as follows. To each list, we assign 
the unique monic polynomial whose roots are z,, 

p(z) = (&—z1)(z — 22) +++ (Z — Za) 


ET ee E ee pg [22] 


This polynomial is uniquely determined by its 
coefficients (240,41,...,4,5 1) € C^, which give good 
global coordinates on M, = C". The zeros z, of ó 
may be used as local coordinates on M,, away from 
A, the subset of M, on which two or more of the 
zeros z, coincide, but are not good global 
coordinates. 

Let (¢,A), denote the static solution correspond- 
ing to a € C”. If the zeros z, are all at least s apart, 
Taubes showed the solution is just a linear super- 
position of 1-vortices located at z,, up to corrections 
exponentially small in s. Imagine these constituent 
vortices are pushed with small initial velocities. 
Then (@(t),A(t)) must remain close to the valley 
bottom M,, since departing from it costs kinetic 
energy, of which there is little. Manton has 
suggested, therefore, that the dynamics is well 
approximated by the constrained variational problem 
wherein (ó(t), A(t)) = (6,A),,,€ M, for all t. Since 
the action $= [Ldx= f (Ekin — E) dt, and E = mn, 
constant, on M,, this constrained problem amounts 
to Lagrangian mechanics on configuration space M, 
with Lagrangian L-—Ey&|,. Now Ey; is real, 
positive, and quadratic in time derivatives of @, A, so 


L = i^,(a)à;à; [23] 


^». forming the entries of a positive-definite n x n 
Hermitian matrix (^, = yrs). Since (@,A), is not 
known explicitly, neither are ?,,(a). Observe, how- 
ever, that L is the Lagrangian for geodesic motion in 
M, with respect to the Riemannian metric 


Y = s (a)da, da, [24] 


Manton originally proposed this geodesic approx- 
imation for monopoles, but it is now standard for all 
topological solitons of Bogomol’nyi type (where one 
has a moduli space of static multisolitons saturating 
a topological lower bound on E). Note that 
geodesics are independent of initial speed, which 
agrees with Myers et al: Figure 5c shows that O(v, b) 


is approximately independent of v for v < 0.5. 
Further, Stuart (1994) has proved that, for initial 
speeds of order c, small, the fields stay (pointwise) e? 
close to their geodesic approximant for times of 
order c^. 

On symmetry grounds, two vortex dynamics in 
the COM frame reduces to geodesic motion in MÌ = 
C, the subspace of centered 2-vortices (aj — 0, so 


zı = —22), with induced metric 
y = G(lao|)daodao [25] 


G being some positive function. Note that ao = z122, 
so the intervortex distance |z; — z2| = 2|zi| = 2|ao| ^. 
The line ao = 8 € R, traversed with 5 increasing, say, 
is geodesic in MÌ. The vortex positions (roots of 
z^ + ao) are +18] for 8 < 0 and +i/@ for 8 > 0. 
This describes perfectly the 90° scattering phenom- 
enon: two vortices approach head-on along the x! 
axis, coincide to form a 2-vortex ring, then break 
apart along the x? axis, as in Figure 6. This behavior 
occurs because 49 = 2122, rather than zı — z2, is the 
correct global coordinate on M3, since vortices are 
classically indistinguishable. 

Samols found a useful formula (Manton and 
Sutcliffe 2004, pp. 205-215) for 7 in terms of the 
behavior of |a| close to its zeros, using which he 
devised an efficient numerical scheme to evaluate 
G(|ao|), and computed O(b) in detail, finding 
excellent agreement with lattice simulations at low 
speeds. He also studied the quantum scattering of 
vortices, approximating the quantum state by a 
wave function V on M, evolving according to the 
natural Schródinger equation for quantum geodesic 
motion, 


ib —— = -1b A V [26] 


where A, is the Laplace-Beltrami operator on 
(M,,^). This technique, introduced for monopoles 
by Gibbons and Manton, is now standard for 
solitons of Bogomol'nyi type. 

By analyzing the forces between moving point 
vortices at A=1, Manton and Speight (2003) 
showed that, as the vortex separations become 
uniformly large, the metric on M, approaches 


2 
t= Merde, EXC - zl) 
x (dz, — dz,)(dz, — 本 [27] 
This formula can also be obtained by a method of 


matched asymptotic expansions. We can use [27] to 
study 2-vortex scattering for large b, when the 
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vortices remain well separated. (Note that 4?* is not 
positive definite if any |z, — z,| becomes too small.) 
The results are good, provided v < 0.5 and b > 3 
(see Figure 5c). 


Other Developments 


The (critically coupled) AHM on a compact physical 
space X is of considerable theoretical and physical 
interest. Bradlow showed that M,,(X) is empty unless 
V = Area(X) > 477, so there is a limit to how many 
vortices a space of finite area can accommodate 
(Manton and Sutcliffe 2004, pp. 227-230). Manton 
has analyzed the thermodynamics of a gas of 
vortices by studying the statistical mechanics of 
geodesic flow on M,(X). In this context, spatial 
compactness is a technical device to allow nonzero 
vortex density n/V for finite n, without confining 
the fields to a finite box, which would destroy the 
Bogomol’nyi properties. In the limit of interest, 
n,V — oo with n/V fixed, the thermodynamical 
properties turn out to depend on X only through 
V, so X—S? and X — T? give equivalent results, for 
example. The equation of state of the gas is 
(P — pressure, T — temperature) 


i nT 
»V—4mrn 


P [28] 
which is similar, at low density n/V, to that of a gas 
of hard disks of area 27. The crucial step in deriving 
[28] is to find the volume of M,(X) which, despite 
there being no formula for y, may be computed 
exactly by remarkable indirect arguments (Manton 
and Sutcliffe 2004, pp. 231-234). 

The static AHM coincides with the Ginzburg- 
Landau model of superconductivity, which has 
precisely the same type I/II classification. Here the 
“Higgs” field represents the wave function of a 
condensate of Cooper pairs, usually (but not always) 
electrons. There has been a parallel development of 
the static model by condensed matter theorists, 
therefore; see Fossheim and Sudbo (2004), for 
example. In fact the vortex was actually first 
discovered by Abrikosov in the condensed matter 
context. One important difference is that type I 
superconductors do not support vortex solutions in 
an external magnetic field Bext because the critical 
[Bext| required to create a single vortex is greater 
than the critical |Bext| required to destroy the 
condensate completely ($= 0). Type Il supercon- 
ductors do support vortices, and there are such 
superconductors with Az 1, but the vortex 
dynamics we have described is not relevant to these 
systems. In this context there is an obvious preferred 


reference frame (the rest frame of the superconduc- 
tor) so it is unsurprising that the Lorentz-invariant 
AHM is inappropriate. Insofar as vortices move at 
all, they seem to obey a first-order (in time) 
dynamical system, in contrast to the second-order 
AHM. Manton has devised a first-order system 
which may have relevance to superconductivity, by 
replacing Ej;, with a Chern-Simons-Schródinger func- 
tional (Manton and Sutcliffe 2004, pp. 193-197). 
Rather than attracting or repelling, vortices now 
tend'to orbit one another at constant separation. 
There is again a moduli space approximation to 
slow vortex dynamics for Az 1, but it has a 
Hamiltonian-mechanical rather than Riemannian- 
geometric flavor. 

Finally, an interesting simplification of the AHM, 
which arises, for example, as a phenomenological 
model of liquid helium-4, is obtained if we discard the 
gauge field A,, or equivalently set the electric charge of 
ġ to e — 0. There is now no type I/II classification, since 
A may be absorbed by rescaling. The resulting model, 
which has only global U(1) phase symmetry, supports 
n-vortices ó—o(r)e"? for all n, but these are not 
exponentially spatially localized, 
n? n*(8+n7) 


M ee OT 7 Pe 


c(r) — 1 
and cannot have finite E by Derrick's theorem. They 
are unstable for |n| > 1, and 1-vortices uniformly 
repel one another. They can be given an interesting 
first-order dynamics (the Gross-Pitaevski equation). 


Abbreviations 

A. electromagnetic gauge potential 
impact parameter 

D, gauge-covariant derivative 

E potential energy 

Exin kinetic energy 

Fo electromagnetic field strength tensor 

L Lagrangian 

E Lagrangian density 

5 action 

Ó Higgs field 

9 scattering angle 


See also: Fractional Quantum Hall Effect; 
Ginzburg-Landau Equation; High 7, Superconductor 
Theory; Integrable Systems: Overview; Nonperturbative 
and Topological Aspects of Gauge Theory; Quantum 
Fields with Topological Defects; Solitons and Other 
Extended Field Configurations; Symmetry Breaking in 
Field Theory; Topological Defects and Their Homotopy 
Classification; Variational Techniques for 
Ginzburg—Landau Energies. 


160 Adiabatic Piston 


Further Reading 


Atiyah M and Hitchin N (1988) Tbe Geometry and Dynamics of 
Magnetic Monopoles. Princeton: Princeton University Press. 
Fossheim K and Sudbo A (2004) Superconductivity: Physics and 
Applications. Hoboken NJ: Wiley. 

Jaffe A and Taubes C (1980) Vortices and Monopoles: Structure 
of Static Gauge Theories. Boston: Birkhauser. 

Nakahara M (1990) Geometry, Topology and Physics. Bristol: 
Adam-Hilger. 

Manton NS and Speight JM (2003) Asymptotic interactions of 
critically coupled vortices. Communications in Mathematical 
Physics 236: 535-555. 


t 
E: 
N 


| Adiabatic Piston 


Ch Gruber, Ecole Polytechnique Fédérale de 
Lausanne, Lausanne, Switzerland 

A Lesne, Université P.-M. Curie, Paris VI, Paris, 
France 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 
Macroscopic Problem 


The “adiabatic piston” is an old problem of 
thermodynamics which has had a long and con- 
troversial history. It is the simplest example con- 
cerning the time evolution of an adiabatic wall, that 
is, a wall which does not conduct heat. The system 
consists of a gas in a cylinder divided by an 
adiabatic wall (the piston). Initially, the piston is 
held fixed by a clamp and the two gases are in 
thermal equilibrium characterized by (p^, T^, N*), 
where the index 一 /十 refers to the gas on the left/right 
side of the piston and (p, T, N) denote the pressure, 
the temperature, and the number of particles 
(Figure 1). Since the piston is adiabatic, the whole 
system remains in equilibrium even if T^ Z T*. At 
time £ — 0, the clamp is removed and the piston is let 
free to move without any friction in the cylinder. The 


Figure 1 The adiabatic piston problem. 
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question is to find the final state, that is, the final 
position X; of the piston and the parameters (pz, T) 
of the gases. 

In the late 1950s, using the two laws of 
equilibrium thermodynamics (i.e., thermostatics), 
Landau and Lifshitz concluded that the adiabatic 
piston will evolve toward a final state where 
p /[T-—p*/T*. Later, Callen (1963) and others 
realized that the maximum entropy condition 
implies that the system will reach mechanical 
equilibrium where the pressures are equal p; — pj; 
however, nothing could be said concerning the final 
position X; or the final temperatures T? which 
should depend explicitly on the viscosity of the 
fluids. It thus became a controversial problem since 
one was forced to accept that the two laws of 
thermostatics are not sufficient to predict the final 
state as soon as adiabatic movable walls are 
involved (see early references in Gruber (1999)). 

Experimentally, the adiabatic piston was used 
already before 1924 to measure the ratio c,/c, of 
the specific heats of gases. In 2000, new measure- 
ments have shown that one has to distinguish 
between two regimes, corresponding to weak damp- 
ing or strong damping, with very different proper- 
ties, for example, for weak damping the frequency 
of oscillations corresponds to adiabatic oscillations, 
whereas for strong damping it corresponds to 
isothermal oscillations. 


Microscopic Problem 


The “adiabatic piston" was first considered from a 
microscopic point of view by Lebowitz who intro- 
duced in 1959 a simple model to study heat 
conduction. In this model, the gas consists of point 
particles of mass m making purely elastic collisions 
on the wall of the cylinder and on the piston. 
Furthermore, the gas is very dilute so that the 


equation of state p=nkgT is satisfied at equili- 
brium, where n is the density of particles in the gas 
and kp the Boltzmann constant. The adiabatic piston 
is taken as a heavy particle of mass M > m without 
any internal degree of freedom. Using this same 
model Feynman (1965) gave a qualitative analysis in 
Lectures in Physics. He argued intuitively but 
correctly that the system should converge first 
toward a state of mechanical equilibrium where 
p —p' and then very slowly toward thermal 
equilibrium. This approach toward thermal equili- 
brium is associated with the “wiggles” of the piston 
induced by the random collisions with the atoms of 
the gas. Of course, this stochastic behavior is not 
part of thermodynamics and the evolution beyond 
the mechanical equilibrium cannot appear in the 
macroscopical framework assuming that the piston 
does not conduct heat. 

From a microscopical point of view, one is 
confronted with two different problems: the 
approach toward mechanical equilibrium in the 
absence of any a priori friction (where the entropy 
of both gases should increase) and, on a different 
timescale, the approach toward thermal equilibrium 
(where the entropy of one gas should decrease but 
the total entropy increase). 

The conceptual difficulties of the problem beyond 
mechanical equilibrium come from the following 
intuitive reasoning. When the piston moves toward 
the hotter gas, the atoms of the hotter gas gain 
energy, whereas those of the cooler gas lose energy. 
When the piston moves toward the cooler side, it is 
the opposite. Since on an average the hotter side 
should cool down and the cold side should warm 
up, we are led to conclude that on an average the 
piston should move toward the colder side. On the 
other hand, from p — nkgT, the piston should move 
toward the warmer side to maintain pressure 
balance. 

In 1996, Crosignani, Di Porto, and Segev intro- 
duced a kinetic model to obtain equations describing 
the adiabatic approach toward mechanical equili- 
brium. Starting with the  microscopical model 
introduced by Lebowitz, Gruber, Piasecki, and 
Frachebourg, later joined by Lesne and Pache, 
initiated in 1998 a systematic investigation of the 
adiabatic piston within the framework of statistical 
mechanics, together with a large number of numer- 
ical simulations. This analysis was based on the fact 
that m/M is a very small parameter to investigate 
expansions in powers of m/M (see Gruber and 
Piasecki (1999) and Gruber et al. (2003) and 
reference therein). An approach using dynamical 
system methods was then developed by Lebowitz 
et al. (2000) and Chernov et al. (2002). An 
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extension to hard-disk particles was analyzed at 
the same time by Kestemont ef al. (2000). Recently, 
several other authors have contributed to this 
subject. 

The general picture which emerges from all the 
investigations is the following. For an infinite 
cylinder, starting with mechanical equilibrium 
p —p'-p, the piston evolves to a stationary 
stochastic state with nonzero velocity toward the 
warmer side 


(VW) =" /TT VT) +0(@) qu 


T 


with relaxation time 


EL V m) P 


where M/A is the mass per unit area of the piston. 


In this state the piston has a temperature 
Tp —VT*T- and there is a heat flux 

m 

fq -(iVI- VT) S b +0 o( 万 ) 


(p^ —p* =p) [3] 
For a finite cylinder and pt Zz p^, the evolution 
proceeds in four different stages. The first two are 
deterministic and adiabatic. They correspond to the 
thermodynamic evolution of the (macroscopic) 
adiabatic piston. The last two stages, which go 
beyond thermodynamics, are stochastic with heat 
transfer across the piston. More precisely: 


1. In the first stage whose duration is the time 
needed for the shock wave to bounce back on the 
piston, the evolution corresponds to the case of 
the infinite cylinder (with p^ p*). If 
R=Nm/M > 10, the piston will be able to 
reach and maintain a constant velocity 


Eu Tkg Tt m 
<0 iar reti 
for |p —p'|«1 [4] 


2. In the second stage the evolution toward 
mechanical equilibrium is either weakly or 
strongly damped depending on R. If R < 1, the 
evolution is very weakly damped, the dynamics 
takes place on a timescale ?' = Rt, and the effect 
of the collisions on the piston is to introduce an 
external potential ó(X)—c4/X? 十 cz/ 代 一 x}. 
On the other hand, if R > 4, the evolution is 
strongly damped (with two oscillations only) and 
depends neither on M nor on R. 
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3. After mechanical equilibrium has been reached, 
the third stage is a stochastic approach toward 
thermal equilibrium associated with heat transfer 
across the piston. This evolution is very slow and 
exhibits a scaling property with respect to 
t' = mt/M. 

4. After thermal equilibrium has been reached 
(T-=T*,p =p*), in a fourth stage the gas 
will evolve very slowly toward a state with 
Maxwellian distribution of velocities, induced 
by the collision with the stochastic piston. 


The general conclusion is thus that a wall which is 
adiabatic when fixed will become a heat conductor 
under a stochastic motion. However, it should be 
stressed that the time required to reach thermal 
equilibrium will be several orders of magnitude larger 
than the age of the universe for a macroscopical piston 
and such a wall could not reasonably be called a heat 
conductor. However, for mesoscopic systems, the effect 
of stochasticity may lead to very interesting properties, 
as shown by Van den Broeck et al. (2004) in their 
investigations of Brownian (or biological) motors. 


Microscopical Model 


The system consists of two fluids separated by an 
“adiabatic” piston inside a cylinder with x-axis, 
length L, and area A. The fluids are made of N* 
identical light particles of mass m. The piston is a 
heavy flat disk, without any internal degree of 
freedom, of mass M œ> m, orthogonal to the 
x-axis, and velocity parallel to this x-axis. If the 
piston is fixed at some position Xo, and if the two 
fluids are in thermal equilibrium characterized by 
(Do, Tg, N*), then they will remain in equilibrium 
forever even if Tj Z Tj: it is thus an “adiabatic 
piston" in the sense of thermodynamics. At a certain 
time t=0, the piston is let free to move and the 
problem is to study the time evolution. To define the 
dynamics, we consider that the system is purely 
Hamiltonian, that is, the particles and the piston 
move without any friction according to the laws of 
mechanics. In particular, the collisions between the 
particles and the walls of the cylinder, or the piston, 
are purely elastic and the total energy of the system 
is conserved. In most studies, one considers that the 
particles are point particles making purely elastic 
collisions. Since the piston is bound to move only in 
the x-direction, the velocity components of the 
particles in the transverse directions play no role in 
this problem. Moreover, since there is no coupling 
between the components in the x- and transverse 
directions, one can simplify the model further by 
assuming that all probability distributions are 


independent of the transverse coordinates. We are 
thus led to a formally one-dimensional problem 
(except for normalizations). Therefore, in this 
review, we consider that the particles are noninter- 
acting and all velocities are parallel to the x-axis. 
From the collision law, if v and V denote the 
velocities of a particle and the piston before a 
collision, then under the collision on the piston: 


y —^v'—-2V-—v-4 a(v — V) 


5 
V— V'=V+a(v—V) 5] 
where 
2m 
— a 6 
e M +m [6] 


Similarly, under a collision of a particle with the 
boundary at x=0 or x= L: 


vv ——v [7] 


Let us mention that more general models have also 
been considered, for example, the case where the 
two fluids are made of point particles with different 
masses m*, or two-dimensional models where the 
particles are hard disks. However, no significant 
differences appear in these more general models and 
we restrict this article to the simplest case. 

One can study different situations: L=oo, L 
finite, and L — oo. Furthermore, taking first M and 
A finite, one can investigate several limits. 


1. Thermodynamic limit for the piston only. In 
this limit, L is fixed (finite or infinite) and 
4 一 co,M —oo, keeping constant the initial 


densities 7 ^ of the fluid and the parameter 
2mA A 


If L is finite, this means that. N^ — oo while 
keeping constant the parameters 


+ 
4 mN+ Maas 
= 一 9 
M M d 


2. Thermodynamic limit for the whole system, 
where L—0oo and A~ L?, N* ~ I?. In this 
limit, space and time variables are rescaled 
according to x‘=x/L and t'—t/L. This limit 
can be considered as a limiting case of (1) where 
R* ~ VA — oo (and time is scaled). 

3. Continuum limit where L and M are fixed and 
N* — oo,m — 0 keeping M, constant, that is, 
R= = eje. 


The case L infinite and the limit (1) have been 
investigated using statistical mechanics (Liouville or 


Boltzmann's equations). On the other hand, the 
limit (2) has been studied using dynamical system 
methods, reducing first the system to a billiard in an 
(N+ 十 N + 1)-dimensional polyhedron. The limit 
(3) has been introduced to derive hydrodynamical 
equations for the fluids. 

In this article, we present the approach based on 
statistical mechanics. Although not as rigorous as (2) 
on a mathematical level, it yields more informations 
on the approach toward mechanical and thermal 
equilibrium. Moreover, it indicates what are the 
open problems which should be mathematically 
solved. In all investigations, advantage is taken of 
the fact that m/M is very small and one introduces 
the small parameter 


c — /m/M <1 [10] 


Let us note that € measures the ratio of thermal 
velocities for the piston and a fluid particle, whereas 
œ ~ e? measures the ratio of velocity changes during 
a collision. 


Starting Point: Exact Equations 


Using the statistical point of view, the time evolution 
is given by Liouville's equation for the probability 
distribution on the whole phase space for (N* 十 
N- +1) particles, with L,A,N*, and M finite. 
Initially (t < 0), the piston is fixed at (Xo, Vo — 0) 
and the fluids are in thermal equilibrium with 
homogeneous densities ny, velocity distributions 
qo (v) = gg (—v), and temperatures 


ge n | dv ng pi (v)v? [11] 


Integrating out the irrelevant degrees of freedom, 
the Liouville's equation yields the equations for 
the distribution p*(x,v;t) of the right and left 
particles: 


Op (x,v;t) + vO,p^(x,v;t) = I^(x,v;t) [12] 


The collision term I*(x,v;t) is a functional of 
px, pX,v; X, V; t), the two-point correlation func- 
tion for a right (resp. left) particle at (x — X,v) and 
the piston at (X, V). Similarly, one obtains for the 
velocity distribution of the piston: 


O, (Vit) =A 'N (V —v) le(V — v)p, dU V’ t) 


+ 6(v — V) py (v; V;t)]dv 
- af v — v)[&(v — V) pt «v; V^: t) 


+6(V - v)p; (v; V ;t)| dv [13] 
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where (v', V’) are given by eqn [5] and 
P OE Vu i dX ps. p(X,v; X, V;t) [14] 


We thus have to solve eqns [12]-[13] with initial 
conditions 


p (x,vit = 0) = ng po (v)O(x)8(Xo — x) 
p' (x, vit = 0) = np yo (v)O(L —x)8(x — Xo) [15] 
q$(V.f-—0)-—40(V) 


Using the fact that a=2m/(M+m) < 1, we can 
rewrite eqn [13] as a formal series in powers of o: 


e Lam] KV) [16 


-Von (v; V;t)dv 
V 
- J (v — V)*ot (v;V;t)dv [17] 


from which one obtains the equations for the 
moments of the piston velocity: 


1d(V”) 

y dt 
n | CO 

" o iml dV V-*E, (Vit) [18] 
zs k!(n — k)! Jæ T 


However, we do not know the two-point correlation 
functions. 

If the length of the cylinder is infinite, the 
condition M >> m implies that the probability for 
a particle to make more than one collision on the 
piston is negligible. Alternatively, one could choose 
initial distributions vj (v) which are zero for |v| < 
Umin, Where Vmin is taken such that the probability 
of a recollision is strictly zero. Therefore, if L — oo, 
one can consider that before a collision on the 
piston the particles are distributed with yj (v) for 
all t, and the two-point correlation functions 
factorize, that is, 


ifv>V 
ifv«V 


t)®(V; t), 
t)®(V;t), 


pyurf (V3 V; t) = x Psuct (V3 


[19] 
Part UU V; t) = b PSorf( V3 


where for L —oo, pz (vt) = yp (v) and thus the 
conditions to obtain eqn [18] are satisfied. 

If L is finite, one can show that the factorization 
property (eqn [19]) is an exact relation in the 
thermodynamic limit for the piston (A- oo, 
M/A= cte). For finite L and finite A, we introduce 
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Assumption 1 (Factorization condition). Before a 
collision the two-point correlation functions have the 
factorization property (eqn [19]) to first order in a. 


Under the factorization condition, we have 
F,(V;t) = F,(V;t)b(V;t) [20] 


with 
kia o [7 wo- Van 


cf. dv(v — ) pt «(vs t) 
一 iR (Vit)— FE (Yt) [21] 


and from eqn [18] 


(=) EU = Ma(F;(V;t))s [22] 


(Sg vn- Mo|(VF3(V; t))s t a(F3(V:t))4] [23] 


Introducing V —(V), then from eqns [12] and [20], 
it follows that the (kinetic) energies satisfy 


a e J = + — t) V 


+5 (FF(Vit y [24 


which implies conservation of energy. 
From the first law of thermodynamics, 


g) =a Pe +y] ps 


where Piy’* and Po = denote the work- and 
heat-power transmitted by the piston to the fluid, 
we conclude from eqns [22] and [25] that the heat 
flux is 


pl = + Ma KV — V)E;(V; t))a 


£5 R2) [26] 


Since a<1, it is interesting to introduce the 
irreducible moments 


A, = (V — V), (27 


and the expansion around V — (V),, 


F*(V;t) = Y ls -Vy qs 


from which one obtains equations for dA,/dt. In 
particular, using the identities 


ir. = "| 
) —3FY L. 


(r+2,+) 
F, 


-2Hj 四 
in [22] and [24], we have 


(Fi) e READ. 


e oie [30] 


cr | 


d /(E* E: , 
- ( a J M + Ma (Fr(Vit))sV 
2 FE(Vit)--5 Y (2r — 3a) 
r2 c 
xF uv. DA, [31] 


Depending on the questions or approximations one 
wants to study, either the distribution ®(V;t) or the 
moments (V"), will be the interesting objects. 
Finally, with the condition [19], one can take 
eqn [12] for x Z X, and impose the boundary 
conditions at x — X;: 


pCR mE (Xari ifv«YV,; 32] 
p'(X,v;t)ep'(X,v^t) itvV, 
and similarly for x 20 and x = L with v' — —v. 


Let us note that this factorization condition is of 
the same nature as the molecular chaos assumption 
introduced in kinetic theory, and with this condition 
eqn [13] yields the Boltzmann equation for this 
model. 

In the following, to obtain explicit results as a 
function of the initial temperatures Ty, we take 
Maxwellian distributions wi(v) and initial condi- 
tions (po, Tj , 5 ) such that the velocity of the piston 
remains small (i.e., |(V),| < |(v*)o]). 


Distribution (V; t) for the Infinite 
Cylinder (L — oc) 


To lowest order in e= Vm/M, and assuming 
|1 — p* /p | is of order e, one obtains from eqn [16] 
the usual Fokker-Planck equation whose solution 
gives 


Po(V;t) =a p- (55? 


4 1 
V(t) = (p —p^) i | est T (1—6^) 


a aie " 


A2 = VT-TF p+VT++p VI- 
M "M p*vT- t p-vT* 


where we have dropped the index “zero” on the 
variable T*,n* and used the equation of state 

—AtkpT*. 

In conclusion, in the thermodynamic limit for the 
piston (M — 06, M/A fixed), eqn [33] shows that 
the evolution is deterministic, that is, ®(V;t)= 

6(V — V(t), where the velocity V(t) of the piston 
tends exponentially fast toward stationary value 
Verar = V(oo) with relaxation time 7— A^! 

Let us note that for pt —p^, we have V(t) = 
and the evolution [33] is identical to the 
Ornstein-Uhlenbeck process of thermalization of 
the Brownian particle starting with zero velocity 
and friction coefficient A. The analysis of [16] to 
first order in « yields then 


(1 u e 7 


14D alo )(V 一 V ootvin [35] 


where a,(t) can be explicitly calculated and ao(t) = 
—A?(t)a2(t) because of the normalization condition. 
Moreover, a(t) ~ (p — p^), that is, a2(t)=0 if 
p —p'.From [35], one obtains 


_ frks = VT-T* 
"7 Vimpr/T- p VT 
x (7 - p*)(1 — e^") 


i-r (pr pT) 


to P ST tp VT 

x (1— 2)te ^ — =" 
m 1 —— i 

"MypPTU LB 
p'vT* +p vT- -An2 36 
[err ace) ne 

and 
(V7), — (V? = A?(r) f + Va 2 (0030) [37] 


From eqn [36], we now conclude that for equal 
pressures p =p", the piston will evolve stochasti- 
cally to a stationary state with nonzero velocity 
toward the warmer side 
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(Vest = PAN T+ — VT-) 
(Vy = VET 


Stat 


ifp =p" [38] 
(V*) stat — 
Let us remark that we have established eqn [35] 
under the condition that |1 —p*/p~|=O(e), but as 
we see in the next section, the stationary value Vs 
obtained from eqn [36] remains valid whenever 


I1 —p*/p-)1-— /T*/T-)| <1. 


Moments (V^) 
for the Piston 


; Thermodynamic Limit 


General Equations: Adiabatic Evolution 


In the thermodynamic limit M — oc, a — 0, y=aA 
is fixed and eqn [16] reduces to 


à, (V;t) = -ya aV; t) [39] 


Integrating [39] with initial condition ®(V;t=0) = 
6(V) yields 


©(V,t)=6(V—V(t)), thatis, (V^), (V); [40] 


where 


$v()-E(V(), V(t-0)-0 . [4i] 


dt 
Moreover, 
F)(V;t) = Fo(V;t)®(V;t) [42] 
and 
px, P(X, v; X, Vit) = p^(x,v;t)é(X — X(t)) 
x 6(V — V(t)) [43] 


where dX(t)/dt = V(t), X(t = 0) = 

In conclusion, as already mentioned, in this limit 
the factorization condition (eqn [19]) is an exact 
relation. Let us note that pz (v; t) = p% (2V — v;t) if 
v > V(t) (on the right) or v < V(t) (on the left). Let 
us also remark that 2mF}(V(t);t) represents the 
effective pressure from the right/left exerted on the 
piston. Moreover, since for any distribution 
pir (Ust), the functions F; (Vit) and —F;(V;t) are 
monotonically decreasing, we can introduce the 
decomposition 


psurt = 2mF;(V;t) = P^ + (5) XA*(V;t)V [44] 


where the static pressure at the surface is 
p^(t) -pz.,(V-—0;t) and the friction coefficients 
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A*(V;t) are strictly positive. The evolution [41] is 
thus of the form 


as d Zp =MV)V [45] 
It involves the difference of static pressure and the 
friction coefficient A(V) —A (V)-- A*(V). Finally, 
from eqn [12], we obtain the evolution of the 
(kinetic) energy per unit area for the fluids in the left 
and right compartments: 


d (< Et 
dt A 
~ 


Therefore, from [40] and [46], and the first law of 
thermodynamics, we recover the conclusions 
obtained in the previous section, that is, in the 
thermodynamic limit for the piston, the evolution 
(eqns [41], [12], and [35]) is deterministic and 
adiabatic (i.e., in [46] only work and no heat is 
involved). 


) =+2mFi(V;t)V [46] 


Infinite Cylinder (L= oo, M = o) 


As already discussed, for L=oo we can neglect the 
recollisions. Therefore, in F7 the distribution p*(v; t) 
can be replaced by mj yp (v) and F7(V) is indepen- 
dent of £. In this case, the evolution of the piston is 
simply given by the ordinary differential equation 
dy A imh Vit z: 0) 20 47 
“V(t)=2mP(V), V(t=0)=0 — 47 
where F;(V) is a strictly decreasing function of V. If 
på =po, then V(t) —0, that is, the piston remains at 
rest and the two fluids remain in their original 
thermal equilibrium. If pj 4 pp, that is, n; ks T; 7 
n  kpT,, the piston will evolve monotonically to a 
stationary state with constant velocity Vstat solution 
of F;(V44:) — 0. From [34], it follows that Vstat is a 
function of nj /ng, Tg, T3 but does not depend on 
the value M/A. Moreover, the approach to this 
stationary state is exponentially fast with relaxation 
time 79 — 1/A(V —0). For Maxwellian distributions 


io (v), Vstat is a solution of 
c m CM =m 
kg (ng To YI No Is )- We (n; T5 = "nj Ty ) 
十 Vicar? (no — Mg 0) t O( j= 0 [48] 
Moreover, 


i-i E" (va TE eni i) 49] 


which implies that the relaxation time will be very 
small either if M/A < 1, or if nf —£7z$ with € > 1. 
In this case, the piston acquires almost immediately 


its final velocity Vsat and one can solve eqn [12] to 
obtain the evolution of the fluids. 


Finite Cylinder (L < oo, M = o) 


For finite L, introducing the average temperature in 
the fluids 


HE”) 
+ t 
y pes jen kp N+ [50] 
we have to solve [41] and [46], that is, 
© V(t) = 2 2m|F; (Vit) - F (V: 0) 
i P 4 [51] 
keg Tav +4m NE F5 (V;t)V 


where F7(V;t) is a functional of pz _,(v;t) which we 
decompose as 


Fi(V;t)- &*(t)kgT* (t) + (=) M*(V;t)V [52] 


with 
= f AV p, t) 
A [53] 
= J dup; (v: t) 
and 
ntkpT+ = f+ [54] 


For a time interval 7; = L4/m/kpgT which is the time 
for the shock wave to bounce back, the piston will 
evolve as already discussed. In particular, if R^ is 
sufficiently large, then after a time 7 = O((R*) !) the 
piston will reach the velocity V given by F;(V,1)—0 
(eqn [47]). For t > 71, Fz(V;t) depends explicitly on 
time. For R* sufficiently large, we can expect that for 
all the velocity V(t) will be a functional of pz (v; t) 
given by F2[ V(t); pz. 5 £)] — 0, and thus the problem 
is to solve eqn [12] with the boundary condition (eqn 
[32]). Since V(t) so defined is independent of M/A, 
the evolution will be independent of M/A if R* is 
sufficiently large. This conclusion, which we cannot 
prove rigorously, will be confirmed by numerical 
simulations. 

To give a qualitative discussion of the evolution 
for arbitrary values of R*, we shall use the following 
assumption already introduced in the experimental 
measurement of cy,/c. 


Assumption 2 (Average assumption). The surface 
coefficients #*(t) and T*(t) (eqns [52]-[53]) coin- 
cide to order 1 in œ with the average value of the 
density and temperature in the fluids, that is, 


S N 718 N* 
AX(t) A(L — X(t)) 
T+ = T+(t) [55] 


We still need an expression for the friction 
coefficients. From 


= p*(t) — 4mVF#(V = 0;t) 
+ mV*n*(t) + O(V?) [56] 


F5 (V;t) 


then, assuming that to first order in a, Fr (V —0; t) is 
the same function of T*(t) as for Maxwellian 
distributions, we have 


A*(V) = (i ES + ] -O(V?) [57] 


Therefore, choosing initial condition such that V(t) 
is small for all time, eqn [51] yields 


V T-X - VT*(L - X) 
=C=4/TyXo-VTHL-Xo) ^ [58] 


We thus obtain the equilibrium point for the 
adiabatic evolution (M — oo): 


N-\ y- _ 2Fo Xi 
(=) m Akg L P 
Nt\_.. —2Eo X 
Gt) TE UAR ( - f i60 
where 
2Es (N-\, . (N*\_, | 
tea) (x) 四 


Solving [58]-[62] gives the equilibrium state (Xf, T7), 
which is a state of mechanical equilibrium p; =p}, 
but not thermal equilibrium T; 4 T, . Moreover, this 
equilibrium state does not depend on M. Having 
obtained the equilibrium point, we can then investi- 
gate the evolution close to the equilibrium point. 
Linearizing eqn [51] around (Xf, TÄ) yields 


i = ks | (Fe) UE 
: t4 T? (L — Xi)" 
M} (L-Xy 


| —-A(V-20)V [63] 
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In other words, the effect of collisions on the piston 
is to induce an external potential of the form 
[euX| ^ + ex(L — X)?] and a friction force. It is a 
damped harmonic oscillator with 


E 1 
; 0 
A= (ui x) xix Xj) 


ene [c oco 


(recall that R* =mN*/M). For the case N^ = N* to 
be considered in the simulations, eqn [64] implies 
that the motion is weakly damped if 


[64] 


_ 3a} fx: h Xe 
with period 
=  - [66] 
Wi) VR = | ae 


and strongly damped if R > Rmax, in agreement with 
experimental observations. 


Moments (V”),: Piston with Finite Mass 
Equation to First Order in o =2m/(M + m) 


If the mass of the piston is finite with M > m, then 
the irreducible moments A, are of the order al’+!)/2! 
where |[(r+ 1)/2] is the integral part of (r+ 1)/2. 
If the factorization condition [19] is satisfied, to first 
order in o we have . 


(VD) = vt) vae) — [67 
where V(t)=(V), and A»(t)- (V2), — (Vy are 
solutions of 

1 d 
xdi V(t) =F, + A2Fo 

m (t) = — 4A3F; + aF 

yd; m 2I 3 (68) 

1 d 


+ (M/2)AA;F; — aF3]} 


and A,=kpTp/M defines the temperature of the 
piston. 


Infinite Cylinder: Heat Transfer 


For the infinite cylinder, the factorization assump- 
tion is an exact relation and in this case the 
functions F;(V;t) are independent of t. The solution 
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of the autonomous system [68] with F, =F,(V) 
shows that the piston evolves to a stationary state 
with velocity V given by 

a F3(V)Fo(V) 


BI e o [69] 


The temperature of the piston is 


^ kgTp | a F3(V) 
A» = = 一 一 
M 4 FA(V) 


[70] 


and the heat flux from the piston to the fluid is 


1.5. m [FIF FF 
a en es i RD. 
P | By =F 


A ~ 2M 
If we choose initial conditions such that |V(t)| < 1 
for all £, and Maxwellian distributions q*(v), the 
solutions V(t), A2(t) coincide with the solutions 
previously obtained (eqns [36] and [37]) and 


[71] 


Í, pge : 
4 Po s(Trt—T")x 


m /8kp 

M Y mz 
" pb 

(p* V T- + p- V T*) 

In conclusion, to first order in »;/ M, there is a heat 

flux from the warm side to the cold one propor- 


tional to (T^ — T^), induced by the stochastic 
motion of the piston. 


[72] 


Finite Cylinder (L < œ, M < co) 


Singular character of the perturbation approach 
Whereas the leading order is actually the “thermo- 
dynamic behavior" M — oc in the first two stages of 
the evolution (fast relaxation toward mechanical 
equilibrium), the fluctuations of order O(a) rule the 
slow relaxation toward thermal equilibrium. It is 
thus obvious that a naive perturbation approach 
cannot give access to “both” regimes. This difficulty 
is reminiscent of the boundary-layer problems 
encountered in hydrodynamics, and the perturbation 
method to be used here is the exact temporal analog 
of the matched perturbative expansion method 
developed for these boundary layers. The idea is to 
implement two different perturbation approaches: 


1. one at short times, with time variable t describing 
the fast dynamics ruling the fast relaxation 
toward mechanical equilibrium; and 

2. one for longer times, with a rescaled time 
variable 7 — at. 


The second perturbation approach above is supple- 
mented with a *slaving principle," expressing that at 
each time of the slow evolution, that is, at fixed 7, 
the still present fast dynamics has reached a local 
asymptotic state, slaved to the values of the slow 


observables. The initial conditions are set on the 
first-stage solution. The initial conditions of the 
second regime match the asymptotic behavior of the 
first-stage solution (“matching condition”). 

The slaving principle is implemented by interpret- 
ing an evolution equation of the form 


A = O(1) [73] 


as follows: it indicates that a is in fact a fast quantity 
relaxing at short times («& 7) toward a stationary 
state a,q(7) slaved to the slow evolution and 
determined by the condition 


A|T,asa(7)] = 0 [74] 


(at lowest order in o, actually A[z, a«q(7)] = Ola) 
which prescribes the leading order of a4(7)); the 
following-order terms can be arbitrarily fixed as 
long as only the first order of perturbation is 
implemented. Physically, such a condition arises to 
express that an instantaneous mechanical equili- 
brium takes place at each time 7 of the slow 
relaxation to thermal equilibrium. 


Equations for the fluctuation-induced evolution of 
the system Following this procedure, we arrive at 
explicit expressions for the rescaled quantities (of order 
O(1))V = V/a, A; = A2/a, and II = (p^ — p*)/a: 


; ALN (Fy Ft — FF; | 
y= k (=) (a) + O(a) 
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We then introduce a (dimensionless) rescaled posi- 
tion for the piston 


mEVEL 76 


which satisfies 


于 =- -r(A y 


dr 3Eo Fi 


To discuss eqn [77], a third assumption has to be 
introduced. 


Assumption 3 (Maxwellian Identities). In the 
regime when V —O(o), the relations between the 
functionals F;,F;, and F; are the same at lowest 
order in a as if the distributions p= (v; Vit) were 
Maxwellian in v: 


|kg T 
+ —— 
Fi (V) = Fp pum 


2kp T= [78] 
Fi) = (7 FEV) - vero 


m 


Using these identities and the (dimensionless) 
rescaled time 


2 [ke XN-Tg - N* Tj) 
a oe — S AN [79] 


where N—N*--N' , we obtain a deterministic 
equation describing the piston motion (Gruber et al. 
2003): 


dé | N N | 

uu eg ed 4.28) —A———U = 28) 

ds 2N 2N [80] 
(S1 Xu 


where X,, is the piston position at the end of the 
adiabatic regime (1.e., X¢, eqn [62]). The meaningful 
observables straightforwardly follow from the solu- 
tion £(s): 


xi) = L(5- 9) 


81] 
T*(s) = [1 +269) ( 


N-To +N*Ty 
2N+ 
The first-order perturbation analysis using a single 


rescaled time 7; — ot, is valid in the regime when 
V = O(a) and it gives access to the relaxation toward 
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thermal equilibrium up to a temperature difference 
T* — T =O(a). For the sake of technical complete- 
ness (rather that physical relevance, since the above 
first-order analysis is enough to get the observable, 


'meaningful behavior), let us mention that the pertur- 


bation analysis can be carried over at higher orders; 
using further rescaled times t? = o? to, . .. , tn = o" to, it 
would allow us to control the evolution up to a 
temperature difference |T* — T | — O(a”); however, 
one could expect that the factorization condition does 
not hold at higher orders. 


Numerical Simulations 


As we have seen, the results were established under 
the condition that m/M is a small parameter. More- 
over for finite systems (L < oo, M < c), it was 
assumed that before collisions and to first order in 
miM, the factorization and the average assumptions 
are satisfied. The numerical simulations are thus 
essential to check the validity of these assumptions, to 
determine the range of acceptable values m/M for the 
perturbation expansion, to investigate the thermo- 
dynamic limit, and to guide the intuition. 

In all simulation, we have taken kg-— 1,7:— 1, 
T- —1 and usually T* — 10. For L finite, we have 
taken L — 60, Xp = 10, A=10°, and N+ = N- = N/2, 
that is, p =R(M/A)(1/10) and p*-—2p'. The 
number of particles N was varied from a few hundreds 
to one or several millions; the mass M of the piston 
from 1 to 10?. We give below some of the results 
which have been obtained for L = oo (Figures 2 and 3) 
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Figure 2 Evolution of the piston for L= oc, and p. — p* — 1 as observed in simulations (stochastic line in (a), dots in (b)) compared 
with prediction: (a) position X(t) for T* = 10; and (b) stationary velocity for T+ = 10 (continuous line) and T+ = 100 (dotted line), as a 
function of M. 
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(a) | (b) 
Figure 3 Evolution of the piston for L— oo, M — 10^, and pt # p` as observed in simulations (continuous line) compared with 


predictions (dotted line): (a) p =1,p*=p~ + Ap, from top to bottom Ap/p~ —0.05,0.1,0.2,1,2, 3; and (b) p =¢, p* =2¢, 
Apip =1;X'’=¢x, t —Ct, (— 102, 10*, 10^ , 1, 10, 10?, 10°, 10*. 
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Figure 4 “Deterministic” evolution toward mechanical equilibrium for L < oc, M = 10°: (a) position X(f; one finds Xem — 8.3 whereas 
Xt —8.42 and (b) velocity Vif); one finds Vsm = —0.343 whereas V'^ = —0.3433. From top to bottom: H — 12: strong damping, 
independent of R and M for R > 4 and M > 10°. R —2: critical damping. R — 0.1: weak damping; damping coefficient increases with R 


and ug ~ VR for R « 1 but is independent of M for M > 10°. 
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Figure 5 Same conditions as Figure 4, R=12: (a) average pressure and temperature in the fluid: pa,(t)= 2E*n*/N*, 
Ti =E*/N*kg and (b) pressure and temperature at the surface of the piston. Prediction: T,, = 1.54, T$ — 9.46, Pia = Piy = 2.2. 


Simulations: T,, = 1.52, 77, —9.48, Pia = pj, = 2.2. 


and for L < oo approach to mechanical equilibrium 
(Figures 4—6) and to thermal equilibrium (Figures 7 
and 8). 


Conclusions and Open Problems 


In this article, the adiabatic piston has been 
investigated to first order in the small parameter 
mí/M, but no attempt has been made to control the 
remainder terms. For an infinite cylinder, no other 
assumptions were necessary and the numerical 
simulations (Figures 2 and 3) are in perfect agree- 
ment with the theoretical prediction in particular for 
the stationary velocity Var, the friction coefficient 
A(V), and the relaxation time 7. 

For a finite cylinder (L < oo) and in the thermo- 
dynamic limit (M — oc), we were forced to introduce 
the average assumption to obtain a set of autono- 
mous equations. As we have seen when initially p^ 
# p*, this limiting case also describes the evolution 
to lowest order during the first two stages character- 
ized by a time of the order t, = L4/m/kgT, where the 
evolution is adiabatic and deterministic. In the first 
stage, that is, before the shock wave bounces back on 
the piston, the simulations confirm the theoretical 


predictions. In particular, they show that if R > 4, 
the piston will be able to reach and maintain for 
some time the velocity Vstat, whereas this will not be 
the case for R < 1 (Figure 4b). In the second stage of 
the evolution, the simulations (Figure 4) exhibit 
damped oscillations toward mechanical equilibrium 
which are in very good agreement with the predic- 
tions for the final state (X44, T7), the frequency of 
oscillations and the existence of weak and strong 
damping depending on R < 1 or R > 4. Moreover, 
the general behavior of the evolution observed in the 
simulations as a function of the parameters was as 
predicted. However, the damping coefficient of these 
oscillations is wrong by one or several orders of 
magnitude. To understand this discrepancy, we note 
that using the average assumption we have related 
the damping to the friction coefficient. However, the 
simulations clearly show that those two dissipative 
effects have totally different origins. Indeed, as one 
can see with L — oc, friction is associated with the 
fact that the density of the gas in front and in the 
back of the piston is not the same as in the bulk, and 
this generates a shock wave that propagates in the 
fluid. For finite L, when R4, the stationary 
velocity Vstar is reached and the effect of friction is 
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Figure 6 Velocity distribution in the left compartment. Same conditions as Figure 4, R = 12. Dotted line corresponds to Maxwellian 
with T- = 1.52: (a) t= 12, 24, 36, 48, 60, 92, 144, 240 from top to bottom and (b) t= 276 —460. 


to transfer in this first stage more and more energy to 
the fluid on one side and vice versa on the other side. 
However, to stop the piston and reverse its motion, 
only a certain amount of the transferred energy is 
necessary and the rest remains as dissipated energy in 
the fluid leading to a strong damping. On the other 
hand, for R < 1, the value Vx; is never reached and 
all the energy transferred is necessary to revert the 
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motion. In this case very little dissipation is involved 
and the damping will be very small. This indicates 
that the mechanism responsible for damping is 
associated with shock waves bouncing back and 
forth and the average assumption, which corresponds 
to a homogeneity condition throughout the gas, 
cannot describe the situation. In fact, the simulations 
(Figure 5b) indicate that the average assumption does 
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Figure 7 Approach to thermal equilibrium, N* —3 x 10*. The smooth curves correspond to the predictions, the stochastic curves to 
simulations: (a) position X(7), 7 = ot, no visible difference for M = 100, 200, 1000 and (b) average temperatures T (7), 7 — ot, M — 200. 


p (vyn 


Adiabatic Piston 173 


0.2 


0.15 


0.1 


p (ym 


0.05 


0 
—15 —10 = 0 5 10 15 


(a) (b) 


Figure 8 Approach to thermal equilibrium from T,,= 1.54 (dotted line in(a)) to Tp —5.5 (heavy line in (b)). Velocity distribution 
function on the left for M = 200, N* —5 x 10^. (a) r=a t=2,4, 14, 48, 92, 144 and (b) approach to Maxwellian distribution for 7 > 445. 


not hold in this second stage. In conclusion, one is 
forced to admit that to describe correctly the 
adiabatic evolution, it is necessary to study the 
coupling between the motion of the piston and the 
hydrodynamic equations of the gas. Preliminary 
investigations have been initiated, but this is still 
one of the major open problems. Another problem 
would be to study the evolution in the case of 
interacting particles. However, investigations with 
hard disks suggest that no new effects should appear. 
To investigate adiabatic evolution, a simpler version 
of the adiabatic piston problem, without any con- 
troversy, has been introduced: this is the model of a 
standard piston with a constant force acting on it. 

In the third stage, that is, the very slow 
approach to thermal equilibrium, another assump- 
tion was necessary, namely the factorization 
condition. The simulations (Figure 7) show a very 
good agreement with the prediction, and in 
particular the scaling property with t/=t/M is 
perfectly verified. It appears that the small dis- 
crepancy between simulations and theoretical 
predictions could be due to the fact that, to 
compute explicitly the coefficients in the equations 
of motion, we have taken Maxwellian relations for 
the velocities of the gas particles, which is clearly 
not the case (Figure 8a). 

The fourth stage of the evolution, that is, the 
approach to Maxwellian distributions (Figure 8b), is 
still another major open problem. Some preliminary 
studies have been conducted, where one investigates 
the stability and the evolution of the system when 
initially the two gases are in the same equilibrium 
state, but characterized by a distribution function 
which is not Maxwellian. 


Finally, let us mention that the relation between the 
piston problem and the second law of thermodynamics 
is one more major problem. The question of entropy 
production out of equilibrium, and the validity of the 
second law, are still highly controversial. Again, 
preliminary results can be found in the literature. 
Among other things, this question has led to a model of 
heat conductivity gases, which reproduces the correct 
behavior (Gruber and Lesne 2005). 


See also: Billiards in Bounded Convex Domains; 
Boltzmann Equation (Classical and Quantum); 
Hamiltonian Fluid Dynamics; Multiscale Approaches; 
Nonequilibrium Statistical Mechanics (Stationary): 
Overview; Nonequilibrium Statistical Mechanics: 
Dynamical Systems Approach. 
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Introduction 


The anti-de Sitter/conformal field theory (AdS/CFT) 
correspondence is a  conjectured equivalence 
between a quantum field theory in d spacetime 
dimensions with conformal scaling symmetry and a 
quantum theory of gravity in (d + 1)-dimensional 
anti-de Sitter space. The most promising 
approaches to quantizing gravity involve super- 
string theories, which are most easily defined in 
10 spacetime dimensions, or M-theory which is 
defined in 11 spacetime dimensions. Hence, the 
AdS/CFT correspondences based on superstrings 
typically involve backgrounds of the form AdS,,, x 
Yo. 4 while those based on M-theory involve back- 
grounds of the form AdS,,, x Y4o 4, where Y are 
compact spaces. 

The examples of the AdS/CFT correspondence 
discussed in this article are dualities between 
(super)conformal nonabelian gauge theories and 
superstrings on AdS; x Y;, where Y; is a five- 
dimensional Einstein space (i.e., a space whose 
Ricci tensor is proportional to the metric, 
Rj; —4g;). In particular, the most basic (and maxi- 
mally supersymmetric) such duality relates 
N =4 SU(N) super Yang-Mills (SYM) and type IIB 
superstring in the curved background AdS; x $°. 

There exist special limits where this duality is 
more tractable than in the general case. If we take 
the large-N limit while keeping the ‘t Hooft coupling 
A-gv4N fixed (gy is the Yang-Mills coupling 
strength), then each Feynman graph of the gauge 
theory carries a topological factor N*, where y is 
the Euler characteristic of the graph. The graphs of 
spherical topology (often called *planar"), to be 
identified with string tree diagrams, are weighted by 
N?; the graphs of toroidal topology, to be identified 
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with string one-loop diagrams, by N, etc. This 
counting corresponds to the closed-string coupling 
constant of order N. Thus, in the large-N limit 
the gauge theory becomes “planar,” and the dual 
string theory becomes classical. For small g$4,N, 
the gauge theory can be studied perturbatively; in 
this regime the dual string theory has not been very 
useful because the background becomes highly 
curved. The real power of the AdS/CFT duality, 
which already has made it a very useful tool, lies in 
the fact that, when the gauge theory becomes 
strongly coupled, the curvature in the dual descrip- 
tion becomes small; therefore, classical supergravity 
provides a systematic starting point for approximat- 
ing the string theory. 

There is a strong motivation for an improved 
understanding of dualities of this type. In one 
direction, generalizations of this duality provide the 
tantalizing hope of a better understanding of 
quantum chromodynamics (QCD); QCD is a non- 
abelian gauge theory that describes the strong 
interactions of mesons, baryons, and glueballs, and 
has a conformal symmetry which is broken by 
quantum effects. In the other direction, AdS/CFT 
suggests that quantum gravity may be understand- 
able as a gauge theory. Understanding the confine- 
ment of quarks and gluons that takes place in 
low-energy QCD and quantizing gravity are well 
acknowledged to be two of the most important 
outstanding problems of theoretical physics. 


Some Geometrical Preliminaries 


The d-dimensional sphere of radius L, $4. may be 
defined by a constraint 


d4-1 

5 D ur " 

i-1 
on d+ 1 real coordinates X'. It is a positively curved 
maximally symmetric space with symmetry group 
SO(d + 1). We will denote the round metric on $^ of 
unit radius by d^. 


The d-dimensional anti-de Sitter space, AdS 4, may 
be defined by a constraint 


d—1 
(XY + xy -» (y sr [2] 
i=] 
This constraint shows that the symmetry group of 
AdS, is SO(2,d — 1). AdSy is a negatively curved 
maximally symmetric space, that is, its curvature 
tensor is related to the metric by 


] 
Rabed -— 9 12 [acEbd "RR Zaa£bc] [3] 


Its metric may be written as 


d 2 
dhas = L?(—07 + Dd + +? amu) A 


where the radial coordinate y € [0,0o), and t£ is 
defined on a circle of length 27. This space has 
closed timelike curves; to eliminate them, we will 
work with the universal covering space where 
t € (一 co,co). The boundary of AdS,, which plays 
an important role in the AdS/CFT correspondence, is 
located at infinite y. There exists a subspace of AdS, 
called the Poincaré wedge, with the metric 


UE Ga — (dx^y + > ae) 5] 
z? mE 


where z € [0, oc). 

A Euclidean continuation of AdS,; is the 
Lobachevsky space (hyperboloid), Ly. It is obtained 
by reversing the sign of (X4, dt?, and (dxo) in [2], 
[4], and [5], respectively. After this Euclidean 
continuation, the metrics [4] and [5] become 
equivalent; both of them cover the entire Ly. 
Another equivalent way of writing the metric is 


ds} = L? (dp? + sinh? pdQ3 1 ) [6] 


which shows that the boundary at infinite p has the 
topology of S^ !. In terms of the Euclideanized 
metric [5], the boundary consists of the R^ at 
z — 0, and a single point at z — oc. 


The Geometry of Dirichlet Branes 


Our path toward formulating the AdS;/CFT, 
‘correspondence requires introduction of Dirichlet 
branes, or D-branes for short. They are soliton-like 
“membranes” of various internal dimensionalities 
contained in type II superstring theories. A Dirichlet 
p-brane (or Dp brane) is a (p+ 1)-dimensional 
hyperplane in (9 + 1)-dimensional spacetime where 
strings are allowed to end. A D-brane is much like a 
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topological defect: upon touching a D-brane, a 
closed string can open up and turn into an open 
string whose ends are free to move along the 
D-brane. For the endpoints of such a string the p + 1 
longitudinal coordinates satisfy the conventional free 
(Neumann) boundary conditions, while the 9 — p 
coordinates transverse to the Dp brane have the fixed 
(Dirichlet) boundary conditions, hence the origin of 
the term “Dirichlet brane." The Dp brane preserves 
half of the bulk supersymmetries and carries an 
elementary unit of charge with respect to the (p + 1)- 
form gauge potential from the Ramond-Ramond 
(RR) sector of type II superstring. 

For this article, the most important property of 
D-branes is that they realize gauge theories on their 
world volume. The massless spectrum of open 
strings living on a Dp brane is that of a maximally 
supersymmetric U(1) gauge theory in p+ 1 dimen- 
sions. The 9 — p massless scalar fields present in this 
supermultiplet are the expected Goldstone modes 
associated with the transverse oscillations of the Dp 
brane, while the photons and fermions provide the 
unique supersymmetric completion. If we consider 
N parallel D-branes, then there are N? different 
species of open strings because they can begin and 
end on any of the D-branes. N? is the dimension of 
the adjoint representation of U(N), and indeed we 
find the maximally supersymmetric U(N) gauge 
theory in this setting. 

The relative separations of the Dp branes in the 
9 —p transverse dimensions are determined by 
the expectation values of the scalar fields. We will 
be interested in the case where all scalar expectation 
values vanish, so that the N Dp branes are stacked 
on top of each other. If N is large, then this stack is 
a heavy object embedded into a theory of closed 
strings which contains gravity. Naturally, this 
macroscopic object will curve space: it may be 
described by some classical metric and other back- 
ground fields including the RR (p+ 2)-form field 
strength. Thus, we have two very different descrip- 
tions of the stack of Dp branes: one in terms of the 
U(N) supersymmetric gauge theory on its world 
volume, and the other in terms of the classical RR 
charged p-brane background of the type II closed 
superstring theory. The relation between these two 
descriptions is at the heart of the connections 
between gauge fields and strings that are the subject 
of this article. 


Coincident D3 Branes 


Gauge theories in 3 + 1 dimensions play an impor- 
tant role in physics, and as explained above, parallel 
D3 branes realize a (3 + 1)-dimensional U(N) SYM 
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theory. Let us compare a stack of D3 branes with 
the RR-charged black 3-brane classical solution 
where the metric assumes the form 


ds? = H7! (y) [-f (o) (dx? + (d£? | 
+H"? feat sae] — qm 
where i= 1, 2, 3 and 
4 


f()- 1-7 


The solution also contains an RR self-dual 5-form 
field strength ^. 


F = dx? ^ dx! ^ dx? ^ dx? ^ d(H*!) 
+ AL* vol(S?) [8] 


4 
H(r) =14+5, 


so that the Einstein equation of type IIB super- 
gravity, Ruy = F,,g,5F, ^? /96, is satisfied. 

In the extremal limit ro — 0, the 3-brane metric 
becomes 


do (1 4 =) ki (—(dx°)? + (d^?) 


peu 
4 (1 +5) (dr? + £? do) [9] 
Just like the stack of parallel, ground-state D3 
branes, the extremal solution preserves 16 of the 
32 supersymmetries present in the type IIB theory. 
Introducing z= L?/r, one notes that the limiting 
form of [9] as + — 0 factorizes into the direct 
product of two smooth spaces, the Poincaré wedge 
[5] of AdS;, and S^, with equal radii of curvature L. 
The 3-brane geometry may thus be viewed as a 
semi-infinite throat of radius L which, for r > L, 
opens up into flat (9 + 1)-dimensional space. Thus, 
for L much larger than the string length scale, Va’, 
the entire 3-brane geometry has small curvatures 
everywhere and is appropriately described by the 
supergravity approximation to type IIB string 
theory. 

The relation between L and v'a’ may be found by 
equating the gravitational tension of the extremal 
3-brane classical solution to N times the tension of a 
single D3 brane: 


2 Mos) - NY (10) 
KA K 

where vol(S°)=7° is the volume of a unit 5-sphere, 
and «= /87G is the ten-dimensional gravitational 
constant. It follows that 


K : 
Lh" = saN = Re Na’? [11] 


where we used the standard relations & = 82 //* go? 
and gf — 47g; [10]. Thus, the size of the throat in 
string units is A'*. This remarkable emergence 
of the ‘t Hooft coupling from gravitational con- 
siderations is at the heart of the success of the AdS/ 
CFT correspondence. Moreover, the requirement 
L > Vo’ translates into A» 1: the gravitational 
approach is valid when the *t Hooft coupling is very 
strong and the perturbative field-theoretic methods 
are not applicable. 


Example: Thermal Gauge Theory from 
Near-Extremal D3 Branes 


An important black hole observable is the Bekenstein- 
Hawking (BH) entropy, which is proportional to the 
area of the event horizon. For the 3-brane solution 
[7], the horizon is located at r— ro. For ro > 0 the 
3-brane carries some excess energy E above its 
extremal value, and the BH entropy is also non- 
vanishing. The Hawking temperature is then defined 
by : UN = OSpy /OE. 

Setting ro «€ L in [9], we obtain a near-extremal 
3-brane geometry, whose Hawking temperature is 
found to be T —ro/(xL?). The eight-dimensional 
“area” of the horizon is 


Ay = (ro/L) VaL? vol(S°) = 4$L*T?V4 — [12] 


where V; is the spatial volume of the D3 brane (i.e., 
the volume of the x! , x?, x? coordinates). Therefore, 
the BH entropy is 

SBH = e: = T NIV! [13] 

K* 2 

This gravitational entropy of a near-extremal 
3-brane of Hawking temperature T is to be 
identified with the entropy of A —4 supersym- 
metric U(N) gauge theory (which lives on N 
coincident D3 branes) heated up to the same 
temperature. 

The entropy of a free U(N) N —4 supermultiplet — 
which consists of the gauge field, 6N? massless 
scalars, and 4N* Weyl fermions — can be calculated 
using the standard statistical mechanics of a 
massless gas (the blackbody problem), and the 
answer is 


So = or V3T° [14] 
It is remarkable that the 3-brane geometry captures 
the T? scaling characteristic of a conformal field 
theory (CFT) (in a CFT this scaling is guaranteed by 
the extensivity of the entropy and the absence of 
dimensionful parameters). Also, the N^ scaling 


indicates the presence of O(N?) unconfined degrees 


of freedom, which is exactly what we expect in the 
N —4 supersymmetric U(N) gauge theory. But what 
is the explanation of the relative factor of 3/4 
between Sgu and So? In fact, this factor is not a 
contradiction but rather a prediction about the 
strongly coupled N=4 SYM theory at finite 
temperature. As we argued above, the supergravity 
calculation of the BH entropy, [13], is relevant to 
the 入 一 co limit of the V —4 SU(N) gauge theory, 
while the free-field calculation, [14], applies to the 
Aà — 0 limit. Thus, the relative factor of 3/4 is not a 
discrepancy: it relates two different limits of the 
theory. Indeed, on general field-theoretic grounds, 
we expect that in the ‘t Hooft large-N limit, the 
entropy is given by 

$= ZT NO V3T? [15] 
The function f is certainly not constant: 
perturbative calculations valid for small A=g%,,N 
give 


3 carr M [16] 
T 


f(A)=1-ZGAt 


Thus, the BH entropy in supergravity, [13], is 
translated into the prediction that 


lim f(A) = 17 


The Essentials of the AdS/CFT 
Correspondence 


The AdS/CFT correspondence asserts a detailed map 
between the physics of type IIB string theory in the 
throat of the classical 3-brane geometry, that is, the 
region r < L, and the gauge theory living on a stack 
of D3 branes. As already noted, in this limit r < L, 
the extremal D3 brane geometry factors into a direct 
product of AdS; x S?. Moreover, the gauge theory 
on this stack of D3 branes is the maximally 
supersymmetric N —4 SYM. 

Since the horizon of the near-extremal 3-brane lies 
in the region r < L, the entropy calculation could 
have been carried out directly in the throat limit, 
where H(r) is replaced by L*/r*. Another way to 
motivate the identification of the gauge theory with 
the throat is to think about the absorption of 
massless particles. In the D-brane description, a 
particle incident from asymptotic infinity is con- 
verted into an excitation of the stack of D-branes, 
that is, into an excitation of the gauge theory on the 
world volume. In the supergravity description, a 
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particle incident from the asymptotic (large r) region 
tunnels into the r« L region and produces an 
excitation of the throat. The fact that the two 
different descriptions of the absorption process give 
identical cross sections supports the identification of 
excitations of AdS; x S with the excited states of 
the N —4 SYM theory. 

Maldacena (1998) motivated this correspondence 
by thinking about the low-energy (a’ — 0) limit of 
the string theory. On the D3 brane side, in this low- 
energy limit, the interaction between the D3 branes 
and the closed strings propagating in the bulk 
vanishes, leaving a pure N —4 SYM theory on the 
D3 branes decoupled from type IIB superstrings in 
flat space. Around the classical 3-brane solutions, 
there are two types of low-energy excitations. The 
first type propagate in the bulk region, r > L, and 
have a cross section for absorption by the throat 
which vanishes as the cube of their energy. The 
second type are localized in the throat, r < L, and 
find it harder to tunnel into the asymptotically flat 
region as their energy is taken smaller. Thus, both 
the D3 branes and the classical 3-brane solution 
have two decoupled components in the low-energy 
limit, and in both cases, one of these components is 
type IIB superstrings in flat space. Maldacena 
conjectured an equivalence between the other two 
components. 

Immediate support for this identification comes 
from symmetry considerations. The isometry group 
of AdS; is SO(2, 4), and this is also the conformal 
group in 3 十 1 dimensions. In addition, we have the 
isometries of S? which form SU(4) ~ SO(6). This 
group is identical to the R-symmetry of the VV =4 
SYM theory. After including the fermionic genera- 
tors required by supersymmetry, the full isometry 
supergroup of the AdS; x $? background is 
SU(2, 2|4), which is identical to the V —4 super- 
conformal symmetry. We will see that, in theories 
with reduced supersymmetry, the S° factor is 
replaced by other compact Einstein spaces Ys, but 
AdS; is the “universal” factor present in the dual 
description of any large-N CFT and makes the 
SO(2, 4) conformal symmetry a geometric one. 

The correspondence extends beyond the super- 
gravity limit, and we must think of AdS; x Y; as a 
background of string theory. Indeed, type IIB strings 
are dual to the electric flux lines in the gauge theory, 
providing a string-theoretic setup for calculating 
correlation functions of Wilson loops. Furthermore, 
if N 一 co while gy,,N is held fixed and finite, then 
there are string scale corrections to the supergravity 
limit (Maldacena 1998, Gubser et al. 1998, Witten 
1998) which proceed in powers of 
a’ /L? —(g?4, N) ^. For finite N, there are also 
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string loop corrections in powers of k?/L ~ N>. 
As expected, with N — oo we can take the classical 
limit of the string theory on AdS; x Y;. However, in 
order to understand the large-N gauge theory with 
finite ‘t Hooft coupling, we should think of AdS; x 
Ys as the target space of a two-dimensional sigma 
model describing the classical string physics. 


Correlation Functions and the Bulk/Boundary 
Correspondence 


A basic premise of the AdS/CFT correspondence is 
the existence of a one-to-one map between gauge- 
invariant operaters in the CFT and fields (or 
extended objects) in AdS. Gubser et al. (1998) and 
Witten (1998) formulated precise methods for 
calculating correlation functions of various opera- 
tors in a CFT using its dual formulation. A physical 
motivation for these methods comes from earlier 
calculations of absorption by 3-branes. When a 
wave is absorbed, it tunnels from asymptotic infinity 
into the throat region, and then continues to 
propagate toward smaller r. Let us separate the 
3-brane geometry into two regions: r > L andr € L. 
For r< L the metric is approximately that of 
AdS; x S°, while for r > L it becomes very different 
and eventually approaches the flat metric. Signals 
coming in from large r (small z=L*/r) may be 
considered as disturbing the “boundary” of AdS; at 
r~ L, and then propagating into the bulk of AdS;. 
Discarding the r 7 L part of the 3-brane metric, the 
gauge theory correlation functions are related to the 
response of the string theory to boundary conditions 
at r~ L. It is therefore natural to identify the 
generating functional of correlation functions in the 
gauge theory with the string theory path integral 
subject to the boundary conditions that 
d(x,z)=do(x) at z=L (at z= all fluctuations 
are required to vanish). In calculating correlation 
functions in a CFT, we will carry out the standard 
Euclidean continuation; then on the string theory 
side, we will work with Ls, which is the Euclidean 
version of AdSs. 

More explicitly, we identify a gauge theory 
quantity W with a string-theory quantity Zstring: 


Woo (x)] 一 Zstring [Po(X)| [18] 


W generates the connected Euclidean Green’s func- 
tions of a gauge-theory operator O, 


Wigo = (exp [d'seO) [19 


Zstring 1S the string theory path integral calculated as 
a functional of $o, the boundary condition on the 
field w related to O by the AdS/CFT duality. In the 


large-N limit, the string theory becomes classical 
which implies 


a e, e llóo(x)] [20] 


where I[óo(x)] is the extremum of the classical string 
action calculated as a functional of $9. If we are 
further interested in correlation functions at very 
large ‘t Hooft coupling, then the problem of 
extremizing the classical string action reduces to 
solving the equations of motion in type IIB super- 
gravity whose form is known explicitly. A simple 
example of such a calculation is presented in the 
next subsection. 

Our reasoning suggests that from the point of 
view of the metric [5], the boundary conditions are 
imposed not quite at z=0, which is the true 
boundary of Ls, but at some finite value z— e. It 
does not matter which value it is since the metric [5] 
is unchanged by an overall rescaling of the coordi- 
nates (z, x); thus, such a rescaling can take z — L into 
z= « for any e. The physical meaning of this cutoff is 
that it acts as a UV regulator in the gauge theory. 
Indeed, the radial coordinate z is to be considered as 
the effective energy scale of the gauge theory, and 
decreasing z corresponds to increasing the energy. A 
safe method for performing calculations of correla- 
tion functions, therefore, is to keep the cutoff on the 
z-coordinate at intermediate stages and remove it 
only at the end. 


Two-Point Functions and Operator Dimensions 


In the following, we present a brief discussion of 
two-point functions of scalar operators in CFT;. 
The corresponding field in L;,, is a scalar field of 
mass m whose Euclidean action is proportional to 


1 di dzz-4*! |(8,0Y d 3p)? mL? ) 
7 x AZZ (0:9) +AA a?) ur 
i21] 


In calculating correlation functions of vertex 
operators from the AdS/CFT correspondence, the 
first problem is to reconstruct an on-shell field in 
L4,1 from its boundary behavior. The near-bound- 
ary, that is, small z, behavior of the classical 
solution is 


olz, x) 一 P lion [2163 十 O(z)] 
+ z^ [A(x) + O(z?)] [22] 


where A is one of the roots of 


A(A = d) =m" L? [23] 


óo(x) is regarded as a “source” in [19] that couples 
to the dual gauge-invariant operator O of dimension 
A, while A(x) is related to the expectation value, 


A(x) = 5 (2) 24 


It is possible to regularize the Euclidean action to 
obtain the following value as a functional of the 
source: 


" g r(A) 
I[do(x)] =— (A — (4/2) "7 ay 


( 
x | dx [eee , P(x € [25] 


Varying twice with respect to ġo, we find that the 
two-point function of the corresponding operator is 


(2A — d)T(A) 1 


(OG)OG = mA da o 
Which of the two roots, A, or A. , of [23] 
d d? 
E- - 272 


should we choose for the operator dimension? For 
positive m?, ^, is certainly the right choice: here the 
other root, A , is negative. However, it turns out 
that for 
2 2 

E em d [28] 
both roots of [23] may be chosen. Thus, there are 
two possible CFTs corresponding to the same 
classical AdS action: in one of them the correspond- 
ing operator has dimension A,, while in the other 
the dimension is A_. We note that A_ is bounded 
from below by (d—2)/2, which is precisely the 
unitarity bound on dimensions of scalar operators in 
d-dimensional field theory! Thus, the ability to 
choose dimension A. is crucial for consistency of 
the AdS/CFT duality. 

Whether string theory on AdS; x Y; contains 
fields with m? in the range [28] depends on Ys. 
The example discussed in the next section, 
Ys — T^!, turns out to contain such fields, and the 
possibility of having dimension A. , [27], is crucial 
for consistency of the AdS/CFT duality in that case. 
However, for Ys — $?, which is dual to the V —4 
large-N SYM theory, there are no such fields and all 
scalar dimensions are given by [27]. 

The operators in the N —4 large-N SYM theory 
naturally break up into two classes: those that 
correspond to the Kaluza-Klein states of super- 
gravity and those that correspond to massive string 
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states. Since the radius of the S° is L, the masses of 
the Kaluza-Klein states are proportional to 1/L. 
Thus, the dimensions of the corresponding operators 
are independent of L and therefore also of A. On the 
gauge-theory side, this independence is explained by 
the fact that the supersymmetry protects the dimen- 
sions of certain operators from being renormalized: 
they are completely determined by the representa- 
tion under the superconformal symmetry. All 
families of the Kaluza-Klein states, which corre- 
spond to such protected operators, were classified 
long ago. Correlation functions of such operators in 
the strong ‘t Hooft coupling limit may be obtained 
from the dependence of the supergravity action on 
the boundary values of corresponding Kaluza-Klein 
fields, as in [19]. A variety of explicit calculations 
have been performed for two-, three-, and even four- 
point functions. The four-point functions are parti- 
cularly interesting because their dependence on 
operator positions is not determined by the con- 
formal invariance. 

On the other hand, the masses of string excita- 
tions are m? — 4n/o/, where n is an integer. For the 
corresponding operators the formula [27] predicts 
that the dimensions do depend on the 't Hooft 
coupling and, in fact, blow up for large À — SYMN as 


2AU^ n. 


Calculation of Wilson Loops 


The Wilson loop operator of a nonabelian gauge 


theory 
W(C) = tr [P exp (i f A) [29] 


involves the path-ordered integral of the gauge 
connection A along a contour C. For N —4 SYM, 
one typically uses a generalization of this loop 
operator which incorporates other fields in the 
N —4 multiplet, the adjoint scalars and fermions. 
Using a rectangular contour, we can calculate the 
quark-antiquark potential from the expectation 
value (W(C)). One thinks of the quarks located a 
distance L apart for a time T, yielding 


(W) ~e TV) [30] 


where V(L) is the potential. 

According to Maldacena, and Rey and Yee, the 
AdS/CFT correspondence relates the Wilson loop 
expectation value to a sum over string world sheets 
ending on the boundary of L;(z—0) along the 
contour C: 


(D ~ [= [31] 


180 AdS/CFT Correspondence 


where S is the action functional of the string world 
sheet. In the large ‘t Hooft coupling limit 和 一 oc, 
this path integral may be evaluated using a saddle- 
point approximation. The leading answer is ~e™, 
where So is the action for the classical solution, 
which is proportional to the minimal area of the 
string world sheet in Ls subject to the boundary 
conditions. The area as currently defined is 
actually divergent, and to regularize it one must 
position the contour at z=e (this is the same type 
of regulator as used in the definition of correlation 
functions). 

Consider a circular Wilson loop of radius a. The 
action of the corresponding classical string world 
sheet is 


So = VA (=—1) [32] 


Subtracting the linearly divergent term, which is 
proportional to the length of the contour, one finds 


In(W) = VA + O(In A) [33] 


a result which has been duplicated in field theory by 
summing certain classes of rainbow Feynman dia- 
grams in N —4 SYM. From these sums, one finds 


2 

(W) rainbow m Nis I (va) [34] 
where I, is a Bessel function. This formula is one of 
the few available proposals for extrapolation of an 
observable from small to large coupling. At large A, 


2 ev? 
(WY attibow ES si [35] 


in agreement with the geometric prediction. 

The quark-antiquark potential is extracted from a 
rectangular Wilson loop of width L and length T. 
After regularizing the divergent contribution to the 
energy, one finds the attractive potential 


Anh X 


VE = Tay 


[36] 
The Coulombic 1/L dependence is required by the 
conformal invariance of the theory. The fact that the 
potential scales as the square root of the ‘t Hooft 
coupling indicates some screening of the charges at 
large coupling. 


Conformal Field Theories and Einstein 
Manifolds 


Interesting generalizations of the duality between 
AdS; x S? and NV —4 SYM with less supersymmetry 
and more complicated gauge groups can be 


= 
x 


Figure 1 D3 branes placed at the tip of a Ricci-flat cone X. 


produced by placing D3 branes at the tip of a 
Ricci-flat six-dimensional cone X (see Figure 1). The 
cone metric may be cast in the form 


dsx? = dr? + r° dsy? [37] 


where Y is the level surface of X. In particular, Y is a 
positively curved Einstein manifold, that is, one for 
which R;=4g;. In order to preserve the N —1 
supersymmetry, X must be a Calabi-Yau space; then 
Y is defined to be Sasaki—Einstein. 

The D3 branes appear as a point in X and span the 
transverse Minkowski space R^!. The ten-dimen- 
sional metric they produce assumes the form [9], but 
with the sphere metric dQs” replaced by the metric on 
Y, dsy. The equality of tensions [10] now requires that 


oN 
i VIN = 4rg,N a"? 


Tv 

2 vol(Y) vol(Y) 
In the near-horizon limit, r 一 0, the geometry factors 
into AdS; x Y. Because the D3 branes are located at a 
singularity, the gauge theory becomes much more 
complicated, typically involving a product of several 
SU(N) factors coupled to matter in bifundamental 
representations, often described using a quiver dia- 
gram (see Figure 2 for an example). 


[38] 


Figure 2 The quiver for Y^?. Each node corresponds to an 
SU(N) gauge group and each arrow to a bifundamental chiral 
superfield. 


The simplest examples of X are orbifolds C?/T, 
where T is a discrete subgroup of SO(6). Indeed, if 
P C SU(3), then N — 1 supersymmetry is preserved. 
The level surface of such an X is Y — S?/T. In this 
case, the product structure of the gauge theory can 
be motivated by thinking about image stacks of D3 
branes from the action of T. 

The next simplest example of a Calabi-Yau cone 
X is the conifold which may be described by the 
following equation in four complex variables: 


ET. =g [39] 


a1 


Since this equation is symmetric under an overall 
rescaling of the coordinates, this space is a cone. The 
level surface Y of the conifold is a coset manifold 
T^! 2(SU(2) x SU(2))/U(1). This space has the 
SO(4) ~ SU(2) x SU(2) symmetry which rotates the 
z’s, and also the U(1) R-symmetry under Za — eza. 
The metric on T^! is known explicitly; it assumes 
the form of an S! bundle over S^ x S?. 

The supersymmetric field theory on the D3 branes 
probing the conifold singularity is SU(N) x SU(N) 
gauge theory coupled to two chiral superfields, A;, 
in the (N, N) representation and two chiral super- 
fields, B;, in the (N, N) representation. The A's 
transform as a doublet under one of the global 
SU(2)’s, while the B's transform as a doublet under 
the other SU(2). Cancelation of the anomaly in the 
U(1) R-symmetry requires that the A's and the B's 
each have R-charge 1/2. For consistency of the 
duality, it is necessary that we add an exactly 
marginal superpotential which preserves the SU(2) x 
SU(2) x U(1)p symmetry of the theory. Since a 
marginal superpotential has R-charge equal to 2 it 
must be quartic, and the symmetries fix it uniquely 
up to overall normalization: 


W = é!& tr AjBLA;Bj [40] 


There are in fact infinite families of Calabi-Yau 
cones X, but there are two problems one faces in 
studying these generalized AdS/CFT correspon- 
dences. The first is geometric: the cones X are not 
all well understood and only for relatively few do 
we have explicit metrics. However, it is often 
possible to calculate important quantities such as 
the vol(Y) without knowing the metric. The second 
problem is gauge theoretic: although many techni- 
ques exist, there is no completely general procedure 
for constructing the gauge theory on a stack of D- 
branes at an arbitrary singularity. 

Let us mention two important classes of Calabi- 
Yau cones X. The first class consists of cones over 
the so-called Y^? Sasaki—Einstein spaces. Here, p 
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and q are integers with p > q. Gauntlett et al. (2004) 
discovered metrics on all the Y^7?, and the quiver 
gauge theories that live on the D-branes probing the 
singularity are now known. Making contact with 
the simpler examples discussed above, the Y^? are 
orbifolds of T)! while the Y^? are orbifolds of S°. 

In the second class of cones X, a del Pezzo surface 
shrinks to zero size at the tip of the cone. A 
del Pezzo surface is an algebraic surface of complex 
dimension 2 with positive first Chern class. One 
simple del Pezzo surface is a complex projective 
space of dimension 2, P?, which gives rise to the 
A —1 preserving S°/Z3 orbifold. Another simple 
case is P! x P!, which leads to T^!/Z;. The 
remaining del Pezzos surfaces B, are P^ blown up 
at k points, 1 < k < 8. The cone where B, shrinks to 
zero size has level surface Y^!. Gauge theories for 
all the del Pezzos have been constructed. Except for 
the three del Pezzos just discussed, and possibly also 
for Bg, metrics on the cones over these del Pezzos 
are not known. Nevertheless, it is known that for 
3 < k <8, the volume of the Sasaki-Einstein mani- 
fold Y associated with B, is 7°(9 — k)/27. 


The Central Charge 


The central charge provides one of the most 
amazing ways to check the generalized AdS/CFT 
correspondences. The central charge c and confor- 
mal anomaly a can be defined as coefficients of 
certain curvature invariants in the trace of the stress 
energy tensor of the conformal gauge theory: 


(T?) = —aE4 — cl4 [41] 


(The curvature invariants E4 and I4 are quadratic in 
the Riemann tensor and vanish for Minkowski 
space.) As discussed above, correlators such as (T) 
can be calculated from supergravity, and one finds 
3N2 
TN 
d = E = mR 42 
4 vol(Y) m 
On the gauge-theory side of the correspondence, 
anomalies completely determine a and c: 


a = 3; (3 tr R? — tr R) 
c = i (9tr R? — Str R) [43] 


The trace notation implies a sum over the R-charges 
of all of the fermions in the gauge theory. (From the 
geometric knowledge that a— c, we can conclude 
that tr R — 0.) 

The R-charges can be determined using the 
principle of a-maximization. For a superconformal 
gauge theory, the R-charges of the fermions 
maximize 4 subject to the constraints that the 
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Novikov-Shifman-Vainshtein-Zakharov | (NSVZ) 
beta function of each gauge group vanishes and 
the R-charge of each superpotential term is 2. 

For the Y^4 spaces mentioned above, one finds 
that 


q? (2p V/A? 3!) 
3p? GP - 2p? + p/4p> -3g) 


[44] 


vol( Y^4) = 3 


The gauge theory consists of p — q fields Z, p + q 
fields Y, 2p fields U, and 2q fields V. These fields all 
transform in the bifundamental representation of a 
pair of SU(N) gauge groups (the quiver diagram for 
Y*? is given in Figure 2). The NSVZ beta function 
and superpotential constraints determine the 
R-charges up to two free parameters x and y. Let x 
be the R-charge of Z and y the R-charge of Y. Then 
the U have R-charge 1 — (1/2)(x-- y) and the V 
have R-charge 1 十 (1/2)(x — y). 

The technique of a maximization leads to the result 


x 一 x (-4p? + 2pq + 34^ + (2p - q) V 4p? — 34>) 
y= x (—4p° - 2pq + 3q° + (2p + a) Ap? — 3°) 


Thus, as calculated by Benvenuti et al. (2004) and 
Bertolini et al. (2004) 


NEZ 
~ 4vol(YP4) 


in remarkable agreement with the prediction [42] of 


the AdS/CFT duality. 


a(YP4) [45] 


A Path to a Confining Theory 


There exists an interesting way of breaking the 
conformal invariance for spaces Y whose topology 
includes an $^ factor (examples of such spaces 
include T^! and Y4, which are topologically 
S^? x S°). At the tip of the cone over Y, one may 
add M wrapped DS branes to the N D3 branes. The 
gauge theory on such a combined stack is no longer 
conformal; it exhibits a novel pattern of quasiperiodic 
renormalization group flow, called a duality cascade. 

To date, the most extensive study of a theory of this 
type has been carried out for the conifold, where one 
finds an N — 1 supersymmetric SU(N) x SU(N + M) 
theory coupled to chiral superfields A;, A» in the 
(N, N+M) representation, and Bı, B2 in the 
(N, N 十 M) representation. D5 branes source RR 
3-form flux; hence, the supergravity dual of this 
theory has to include M units of this flux. Klebanov 
and Strassler (2000) found an exact nonsingular 
supergravity solution incorporating the 3-form and 


the 5-form RR field strengths, and their back-reaction 
on the geometry. This back-reaction creates a “geo- 
metric transition" to the deformed conifold 


Y 4-8 4| 


a—1 


and introduces a *warp factor" so that the full ten- 
dimensional geometry has the form 


ds10 = b I" (r)(- (d^) 
+ (dx!)*) + b! (7) dig” [47] 


where ds,” is the Calabi-Yau metric of the deformed 
conifold, which is known explicitly. 

- The field-theoretic interpretation of this solution is 
unconventional. After a finite amount of RG flow, the 
SU(N + M) group undergoes a Seiberg duality trans- 
formation. After this transformation, and 
an interchange of the two gauge groups, the new 
gauge theory is SU(N) x SU(N + M) with the same 
matter and superpotential, and with N = N — M. The 
self-similar structure of the gauge theory under the 
Seiberg duality is the crucial fact that allows this 
pattern to repeat many times. If N = (k + 1)M, where 
k is an integer, then the duality cascade stops after k 
steps, and we find SU(M) x SU(2M) gauge theory. 
This IR gauge theory exhibits a multitude of interesting 
effects visible in the dual supergravity background. 
One of them is confinement, which follows from the 
fact that the warp factor h is finite and nonvanishing at 
the smallest radial coordinate, 7=0. The methods 
presented in the section “Calculation of Wilson loops,” 
then imply that the quark—antiquark potential grows 
linearly at large distances. Other notable IR effects 
are chiral symmetry breaking and the Goldstone 
mechanism. Particularly interesting is the appearance 
of an entire “baryonic branch” of the moduli space in 
the gauge theory, whose existence has been demon- 
strated also in the dual supergravity language. 


Conclusions 


This article tries to present a logical path from 
studying gravitational properties of D-branes to the 
formulation of an exact duality between conformal 
field theories and string theory in anti-de Sitter 
backgrounds, and also sketches some methods for 
breaking the conformal symmetry. Due to space 
limitations, many aspects and applications of the 
AdS/CFT correspondence have been omitted. At 
the moment, practical applications of this duality 
are limited mainly to very strongly coupled, large-N 
gauge theories, where the dual string description is 
well approximated by classical supergravity. To 
understand the implications of the duality for more 
general parameters, it is necessary to find better 


methods for attacking the world sheet approach to 
string theories in anti-de Sitter backgrounds with RR 
background fields turned on. When such methods are 
found, it is likely that the material presented here will 
have turned out to be just a tiny tip of a monumental 
iceberg of dualities between fields and strings. 
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One can distinguish three classes of affine quantum 
groups, each leading to a different dependence of the 
R-matrices on the spectral parameter u: Yangians 
lead to rational R-matrices, quantum affine algebras 
lead to trigonometric R-matrices, and elliptic quan- 
tum groups lead to elliptic R-matrices. We will mostly 
concentrate on the quantum affine algebras but many 
results hold similarly for the other classes. 

After giving mathematical details about quantum 
affine algebras and Yangians in the first two sections, 
we describe how these algebras arise in different 
areas of mathematical physics in the three following 
sections. We end with a description of boundary 
quantum groups which extend the formalism to the 
boundary Yang-Baxter (reflection) equation. 
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Quantum Affine Algebras 
Definition 


A quantum affine algebra U,(q) is a quantization of 
the enveloping algebra U(q) of an affine Lie algebra 
(Kac-Moody algebra) à. So we start by introducing 
affine Lie algebras and their enveloping algebras 
before proceeding to give their quantizations. 

Let g be a semisimple finite-dimensional Lie algebra 
over C of rank r with Cartan matrix (aj)ij=1,...r 
symmetrizable via positive integers d;, so that djaj is 
symmetric. In terms of the simple roots a;, we have 

Q' * CX 


à; —2—— and dj = 
1] la; |? 1 2 


We can introduce an ag = > , "oj in such a way 
that the extended Cartan matrix (4j);;.o,.., is of 
affine type — that is, it is positive semidefinite of 
rank r. The integers n; are referred to as Kac indices. 
Choosing ao to be the highest root of q leads to an 
untwisted affine Kac- Moody algebra while choosing 
ao to be the highest short root of g leads to a twisted 
affine Kac- Moody algebra. 

One defines the affine Lie algebra à corresponding 
to this affine Cartan matrix as the Lie algebra 
(over C) with generators H; EF for i—0,1,...,r 
and D with relations 


IH, E* | = bag 


[H;, Hj] = 0 


Et, E; | = 6H; [2] 
[D,H]-0,  [D,E#] 


Sei 


The E* are referred to as Chevalley generators and 
the last set of relations are known as Serre relations. 
The generator D is known as the canonical deriva- 
tion. We will denote the algebra obtained by 
dropping the generator D by q’. 

In applications to physics, the affine Lie algebra q 
often occurs in an isomorphic form as the loop Lie 
algebra g[z,z']@C-c with Lie product (for 
untwisted a) . 


= +6;9E> 


" esr) -o is 


[X2^, Yz] = [X, Y]z*' + 6, _)(X, Y)e, 
for X,Y €g, k lEz [3] 


and c being the central element. 

The universal enveloping algebra U(ĝ) of à is the 
unital algebra over C with generators H;,E* for 
i=0,1,...,r and D and with relations given by [2] 
where now [,] stands for the commutator instead of 
the Lie product. 


To define the quantization of U(q), one can either 
define U,(à) (Drinfeld 1985) as an algebra over the 
ring C[[5]] of formal power series over an indeter- 
minate / or one can define U,(q) (Jimbo 1985) as an 
algebra over the field O(q) of rational functions of q 
with coefficients in Q. We will present U,(q) first. 

The quantum affine algebra U,(q) is the unital 
algebra over C[[5]] topologically generated by 
Hj, E? for i=0,1,...,r and D with relations 


Hi, E dE +a,E*;  [H;,H;] =0 
H; H; 
一 di v qi 
本， E; | n bij bes "7 [4] 
[D,H]- 0, [D, Ef] = +6,0E* 


1 —dij 


2 | EA EAE =o, i) 
qi 


k=0 


where q;—4^ and q=. 
cients are defined by 


The g-binomial coeffi- 


ie B 
[n], = ae [5] 
in]! = lr, ln — 1y- -240 6] 
| " E le a 7] 


The quantum affine algebra U;,(q) is a Hopf 
algebra with coproduct 


A(D =D@1+1@D 
A(H;) =H; @1+1@H, [8] 
A(E?) —Etg«G DHL 4 gill? @ pt 


antipode 


and co-unit 
e(D) = e(H;) = «(E7) =0 [10] 


It is easy to see that the classical enveloping 
algebra U(G) can be obtained from the above by 
setting b= 0, or more formally, 


U,(G)/bU,(à) = U(&) 


We can also define the quantum affine algebra 
U,(q) as the algebra over Q(q) with generators 
K;,E*,D for i—0,1,...,r and relations that are 
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obtained from the ones given above for U,(à) by 
setting 


q^ = Ki, d NE [11] 


One can go further to an algebraic formulation over 
C in which q is a complex number (with some points 
including 4 — 0 not allowed). This has the advantage 
that it becomes possible to specialize, for example, to 
q a root of unity, where special phenomena occur. 


Representations 


For applications in physics, the finite-dimensional 
representations of U;(q’) are the most interesting. As 
will be explained in later sections, these occur, for 
example, as particle multiplets in 2D quantum field 
theory or as spin Hilbert spaces in quantum spin 
chains. In the next subsection, we will use them to 
derive matrix solutions to the Yang—Baxter equation. 

While for a nonaffine quantum algebra U;(q) 
the ring of representations is isomorphic to that of 
the classical enveloping algebra U(q) (because in fact 
the algebras are isomorphic, as Drinfeld has pointed 
out), the corresponding fact is no longer true for affine 
quantum groups, except in the case d — all) = sl, , ,. 

For the classical enveloping algebras U(à'), any 
finite-dimensional representation of U(q) also carries 
a finite-dimensional representation of U(q’). In the 
quantum case, however, in general, an irreducible 
representation of U,(q’) reduces to a sum of 
representations of U;(q). 

To classify the finite-dimensional representations 
of U;,(q’), it is necessary to use a different realization 
of U,(a’) that looks more like a quantization of the 
loop algebra realization [3] than the realization in 
terms of Chevalley generators. In terms of the 
generators in this alternative realization, which we 
do not give here because of its complexity, the 
finite-dimensional representations can be viewed as 
pseudo-highest-weight representations. There is a set 
of r “fundamental” representations V^, a — 1,...r, 
each containing the corresponding U,(q) fundamen- 
tal representation as a component, from the tensor 
products of which all the other finite-dimensional 
representations may be constructed. The details can 
be found in Chari and Pressley (1994). 

Given some representation 9: Uj(à') — End(V), 
we can introduce a parameter A with the help of 
the automorphism 7) of U,(à') generated by D and 
given by 


n (EF) = XE? 


Yu [12] 
TA(Hi) = Hj 


Different choices for the s; correspond to different 
gradations. Commonly used are the “homogeneous 


5 


gradation,” sy — 1,51 = ::: —5s, —0, and the “prin- 
cipal gradation,” so9—5s;— --- —s,—1. We shall 
also need the “spin gradation” s;=d;'. The 
representations 


px = POT 


play an important role in applications to integrable 
models where A is referred to as the (multiplicative) 
spectral parameter. In applications to particle scatter- 
ing introduced in a later section, it is related to the 
rapidity of the particle. The generator D can be 
realized as an infinitesimal scaling operator on A and 
thus plays the role of the Lorentz boost generator. 

The tensor product representations p$ 四 p, are 
irreducible generically but become reducible for 
certain values of A/j, a fact which again is important 
in applications (fusion procedure, particle-bound 
states). 


R-Matrices 


A Hopf algebra A is said to be *almost cocommu- 
tative” if there exists an invertible element R € A Q A 
such that 


RA(x)—(coA(x)R, forall x € A [13] 


where o:x $ y — y ® x exchanges the two factors in 
the coproduct. In a quasitriangular Hopf algebra, 
this element R satisfies 


(A 的 id)(R) = R13R23 


l 14 
(id & A)(R) = asa DA 
and is known as the “universal R-matrix” (see Hopf 
Algebras and g-Deformation Quantum Groups). As 
a consequence of [13] and [14], it automatically 
satisfies the Yang—Baxter equation 


Ri2R13R23 = R23R13R12 [15] 


For technical reasons, to do with the infinite number 
of root vectors of à, the quantum affine algebra U,(Q) 
does not possess a universal R-matrix that is an 
element of Uj(à) ® U;,(q). However, as pointed out 
by Drinfeld (1985), it possesses a pseudouniversal 
R-matrix R(A) € (Uj(à') & Uj(Gà))(((A) The A is 
related to the automorphism 7, defined in [12]. 
When using the homogeneous gradation, R(A) is a 
formal power series in A. 

When the pseudouniversal R-matrix is evaluated 
in the tensor product of any two indecomposable 
finite-dimensional representations p; and p», one 
obtains a numerical R-matrix 


RU (A) = (p! & p^yR(X) [16] 
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The entries of these numerical R-matrices are 
rational functions of the multiplicative spectral 
parameter 入 but when written in terms of the 
additive spectral parameter u= log(A) they are 
trigonometric functions of z and satisfy the Yang- 
Baxter equation in the form given in [1]. The matrix 


R"(A) 2 co RV (A) 


satisfies the intertwining relation 
R' Op) - (p & o) (Ale) 
= (POA) A) RA) [7 


for any x € U;(q’). It follows from the irreducibility 
of the tensor product representations that these 
R-matrices satisfy the Yang-Baxter equations 


(id & RP? (u/v) (RP (A/v) & id)(id & R'*(A/u)) 
= (R* (A/u) &id)(id & RP (A/v)) 
x (R (u/v) @ id) [18] 
or, graphically, 


V? Q V?evi VeV? & Vi 


V e VeV? Viev2 e v? 
Explicit formulas for the — pseudouniversal 
R-matrices were found by Khoroshkin and Tolstoy. 
However, these are difficult to evaluate explicitly in 
specific representations so that in practice it is easiest 
to find the numerical R-matrices R?^(A) by solving the 
intertwining relation [17]. It should be stressed that 
solving the intertwining relation, which is a linear 
equation for the R-matrix, is much easier than directly 
solving the Yang- Baxter equation, a cubic equation. 


Yangians 


As remarked by Drinfeld (1986), for untwisted g the 
quantum affine algebra U;(g’) degenerates as h — 0 
into another quasipseudotriangular Hopf algebra, 
the *Yangian" Y(g) (Drinfeld 1985). It is associated 
with R-matrices which are rational functions of the 
additive spectral parameter u. Its representation ring 
coincides with that of U;(g'). 

Consider a general presentation of a Lie algebra g, 
with generators I, and structure constants fabes 
so that 


Ua, Ib] = Fabel; A(L;) —le61-416Gl, 


(with summation over repeated indices). The Yan- 
gian Y(g) is the algebra generated by these and a 
second set of generators J, satisfying 


[Ia Jo] = habcle 
AUa) = Ja @1+1 @QJa +$ fabele e n 


The requirement that A be a homomorphism 
imposes further relations: 


[Jas [Jos {el] ni TES [Jos Jel] = Nabader laste, lo} 


and 


[[Ja Jo]; His Jim] * [ts Jn]; Ja, Jol] 
= (albedo! irit + a E We abc) {I di I e J zt 


where 


Qabcdeg 一 si fuiffe {x1, x2, x3] = > XiXjXk 
i#j#k 

When g=sl> the first of these is trivial, while for 
g Æ sl; the first implies the second. The co-unit is 
e(L;) 2 €(J;) 20; the antipode is s(I;) 2 —1L;, s(J4) = 
Ja + (1/2)f,, I.Ij. The Yangian may be obtained 
from U,(g) by expanding in powers of hb. For 
the precise relationship, see Drinfeld (1985) and 
MacKay (2005). In the spin gradation, the auto- 
morphism [12] generated by D descends to Y(q) as 
I; I, Ja Ja + ula. 

There are two other realizations of Y(a). The first 
(see, for example, Molev 2003) defines Y(al,) 
directly from 


R(u — v)T,(u)T2(v) = To(v)Ti(u) R(u — v) 
where Ti(u) = T(u) & id, T>(v) =id & T(v), and 


T(u) = > tj (u) © eij 
ijel 


t; (14) = Oi + lju’ +j” Tin 


where ej; are the standard matrix units for gl,,. The 
rational R-matrix for the n-dimensional representa- 
tion of gl,, is 


P n 
Se} a here P = 5 & ei 
R(u — v) zo,» where S| ei Bj 


ij=1 


is the transposition operator. Y(gl,,) is then defined 
to be the algebra generated by Ij, Jj, and must be 
quotiented by the “quantum determinant” at its 
center to define Y(sl,,). The coproduct takes a 
particularly simple form, 


A(tij(u)) = V ti (wu) & tyj(u) 
k—1 
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Here we do not give explicitly the third realization, 
namely Drinfeld's *new" realization of Y(g) (Drinfeld 
1988), but we remark that it was in this presentation 
that Drinfeld found a correspondence between certain 
sets of polynomials and finite-dimensional irreducible 
representations of Y(g), thus classifying these (although 
not thereby deducing their dimension or constructing 
the action of Y(g)). As remarked earlier, the structure is 
as in the earlier section: Y(g) representations are in 
general g-reducible, and there is a set of r fundamental 
Y(g)-representations, containing the fundamental 
g-representations as components, from which all 
other representations can be constructed. 


Origins in the Quantum 
Inverse-Scattering Method 


Quantum affine algebras for general g first appear in 
Drinfeld (1985, 1986) and Jimbo (1985, 1986), but 
they have their origin in the “quantum inverse- 
scattering method" (QISM) of the St. Petersburg 
school, and the essential features of Uj(sl;) first 
appear in Kulish and Reshetikhin (1983). In this 
section, we explain how the quantization of the Lax- 
pair description of affine Toda theory led to the 
discovery of the U,(g) coproduct, commutation 
relations, and R-matrix. We use the normalizations 
of Jimbo (1986), in which the H; are rescaled so that 
the Cartan matrix aj = ojo; is symmetric. 
We begin with the affine Toda field equations 


m? r 


He = 
0" 0,Ó UB. 


a, : nje” 000) 


an integrable model in R'*! of r real scalar fields 
pi(x,t) with a mass parameter m and coupling 
constant  J. Equivalently, we may write 
[3x + Lx, 0; 4- L;] — 0 for the Lax pair 


1, Hð += "ye Wad (E+ + E) 


2 £3 


"m (8/2)aojój * ON 
NM [2)anj$ (az Zo) 
-5 H;O, o; + M (4/2)ai¢i (E+ — E7) 


2 fi 
f j l 
+BY erie (sg - 15) 
j=1 


with arbitrary \ € C. The classical integrability of the 
system is seen in the existence of r(A, A") such that 


Ly(x, t) = 


{T(A) & T(A)) = rA, 3), TA) & TO’) 


where T(A)=T(—o0,00;A) and T(x,y;A)= 
P exp( [J L(& A) d£). Taking the trace of this relation 
gives an infinity of charges in involution. 

Quantization is problematic, owing to divergences 
in T. The QISM regularizes these by putting the 
model on a lattice of spacing A, defining the lattice 
Lax operator to be 


LaO 2 T((n — 1/2)A, (n + 1/2)A; X) 


(n--(1/2))A 
=Pexp ( f L(& A) a) 
(n—(1/2))A 


The lattice monodromy matrix is then T(A)— 
lim. , —00, M — 00 I" where i? = | oy PETE vee Faia, 
and its trace again yields an infinity of commuting 
charges, provided that there exists a quantum 
R-matrix R(A1, A2) such that 


R(Aq, A2) Ly, OQ) L2 (2) 
= L7(A2)L)(A1)R(A1, A2) [19] 


where L1(A1) = Ln(A1) @ id, L2(A2) 2 id & Ly (Az). 
That R solves the Yang-Baxter equation follows 
from the equivalence of the two ways of intertwining 
L5(M) e L,4(A2) e L,(A3) with L5,(A3) G9 L,(A2) ® 
Ly (Ai Jj; 

To compute L,(A), one uses the canonical, equal- 
time commutation relations for the ¢; and ó;. In 
terms of the lattice fields 


(m+(1/2))A 
Pin — fe 
-(1/2))A 


(n+(1/2))A 
m J. Rr (8/2)asti(x) dx 


(n—(1/2))A 


di(x) dx 


the only nontrivial relation is 
(158/2)65q; ,, and one finds 


L,(A) — exp E 2. Hit. + exp k >», 23 
; j 
x 3 p qi (Ej + E; ) 
$ Las (osi i xEo ) 
x exp R> hi. ) + O(A?) 
j 


the expression used by the St Petersburg school and 
by Jimbo. We now make the replacement 
Et q ™Etq™/4, where -q= exp(ib?/2), and 
compute the O(A) terms in [19], which reduce to 


[Din i.n] = 
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R(z)(H; & 1+ 1 @ Hj) 
= (H; ®1+1@® Hj)R(z) 


R(z) (E? @ q-/? + q/ @ EF) 

= (qh Q EF + EFO q'? JR) 
R(z) (z* Ej @ gq P» + gH @ Ej) 

= (q^ @ Ej +z" Ej eq )R(G) 


where z—A1/A;. We recognize in these the U,(à) 
coproduct and thus the intertwining relations, in the 
homogeneous gradation. These equations were 
solved for R in defining representations of 
nonexceptional g by Jimbo (1986). 

For g=sl2, it was Kulish and Reshetikhin (1983) 
who first discovered that the requirement that the 
coproduct must be an algebra homomorphism forces 
the replacement of the commutation relations of 
U(sl2) by those of U;(sl2); more generally it requires 
the replacement of U(g) by U;(g). 


Affine Quantum Group Symmetry 
and the Exact S-Matrix 


In the last section, we saw the origins of U;(g) in the 
“auxiliary” algebra introduced in the Lax pair. 
However, the quantum affine algebras also play a 
second role, as a symmetry algebra. An imaginary- 
coupled affine Toda field theory based on the affine 
algebra à" possesses the quantum affine algebra 
U,(g as a symmetry algebra, where g" is the 
Langland dual to g (the algebra obtained by 
replacing roots by coroots). 

The solitonic particle states in affine Toda theories 
form multiplets which transform in the fundamental 
representations of the quantum affine algebra. Multi- 
particle states transform in tensor product representa- 
tions V^ & V^, The scattering of two solitons of type 
a and b with relative rapidity 0 is described by the 
S-matrix $7°(@): V^ & V^ ^ V^ @ V°, graphically 
represented in Figure 1a. It then follows from the 
symmetry that the two-particle scattering matrix 


b a b a 
i 

e 

0 ab 
a b a” b 


(a) (b) 
Figure 1 (a) Graphical representation of a two-particle 
scattering process described by the S-matrix Sa»(0). (b) At 
special values 02, of the relative spectral parameter, the two 
particles of types a and b form a bound state of type c. 


(S-matrix) for solitons must be proportional to the 
intertwiner for these tensor product representa- 
tions, the R matrix: 


S^ (0) = F” (0) R" (8) 


with 0 proportional to u“, the additive spectral 
parameter. The scalar prefactor f^^(0) is not deter- 
mined by the symmetry but is fixed by other 
requirements like unitarity, crossing symmetry, and 
the bootstrap principle. 

It turns out that the axiomatic properties of the 
R-matrices are in perfect agreement with the 
axiomatic properties of the analytic S-matrix. For 
example, crossing symmetry of the S-matrix, gra- 
phically represented by 


p a b a b a 
XS) P-O 
a b a b a b 


is a consequence of the property of the universal 
R-matrix with respect to the action of the antipode S, 


(SQ@1)\R=R! 


An S-matrix will have poles at certain imaginary 
rapidities be corresponding to the formation of 
virtual bound states. This is graphically represented 
in Figure 1b. The location of the pole is determined 
by the masses of the three particles involved, 


2 2 2 - gal 
m. = m; +m), + 2m,m, cos(107") 


At the bound state pole, the S-matrix will project 
onto the multiplet V°. Thus, the R matrix has to have 
this projection property as well and indeed, this turns 
out to be the case. The bootstrap principle, whereby 
the S-matrix for a bound state is obtained from the 
S-matrices of the constituent particles, 


c d C 
- [21] 


a b d à b 


is a consequence of the property [14] of the universal 
R-matrix with respect to the coproduct. 

There is a famous no-go theorem due to Coleman 
and Mandula which states the “impossibility of 
combining space-time and internal symmetries in 
any but a trivial way." Affine quantum group 
symmetry circumvents this no-go theorem. In fact, 
the derivation D is the infinitesimal two-dimensional 
Lorentz boost generator and the other symmetry 


charges transform nontrivially under these Lorentz 
transformations, see [2]. 

The noncocommutative coproduct [8] means 
that a U,(q) symmetry generator, when acting on a 
2-soliton state, acts differently on the left soliton 
than on the right soliton. This is only possible 
because the generator is a nonlocal symmetry charge 
— that is, a charge which is obtained as the space 
integral of the time component of a current which 
itself is a nonlocal expression in terms of the fields 
of the theory. 

Similarly, many nonlinear sigma models possess 
nonlocal charges which form Y(g), and the con- 
struction proceeds similarly, now utilizing rational 
R-matrices, and with particle multiplets forming 
fundamental representations of Y(g). In each case, 
the three-point couplings corresponding to the 
formation of bound states, and thus the analogs for 
U,(g) and Y(g) of the Clebsch-Gordan couplings, 
obey a rather beautiful geometric rule originally 
deduced in simpler, purely elastic scattering models 
(Chari and Pressley 1996). 

More details about this topic can be found in 
Delius (1995) and MacKay (2005). 


Integrable Quantum Spin Chains 


Affine quantum groups provide an unlimited supply 
of integrable quantum spin chains. From any 
R-matrix R(0) for any tensor product of finite- 
dimensional representations W & V, one can pro- 
duce an integrable quantum system on the Hilbert 
space V*", This Hilbert space can then be inter- 
preted as the space of n interacting spins. The space 
W is an auxiliary space required in the construction 
but not playing a role in the physics. 

Given an arbitrary R-matrix R(0), one defines the 
monodromy matrix T(0) € End(W & V*") by 


T(0) = Ro1(0 — 0;) Ro? (0 — 02) -- - Ros(0 — On) 


where, as usual, Rj is the R-matrix acting on the 
ith and jth component of the tensor product space. 
The 6; can be chosen arbitrarily for convenience. 
Graphically the monodromy matrix can be repre- 


sented as 
"十 上 上- 十 


Vi V2 Vs = Vn-1 Vn 


As a consequence of the Yang-Baxter equation 
satisfied by the R-matrices the monodromy matrix 
satisfies 


RTT — TTR [22] 
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or, graphically, 


V4 Vo ah: T V, V, Vo = V, 


One defines the transfer matrix 
T(0) = trwT(0) 


which is now an operator on V*", the Hilbert space 
of the quantum spin chain. Due to [22], two transfer 
matrices commute, 


[r(6), 7(9)] = 0 


and thus the 7(0) can be seen as a generating 
function of an infinite number of commuting 
charges, one of which will be chosen as the 
Hamiltonian. This Hamiltonian can then be diag- 
onalized using the algebraic Bethe ansatz. 

One is usually interested in the thermodynamic 
limit where the number of spins goes to infinity. In 
this limit, it has been conjectured, the Hilbert space 
of the spin chain carries a certain infinite-dimensional 
representation of the quantum affine algebra and this 
has been used to solve the model algebraically, using 
vertex operators (Jimbo and Miwa 1995). 


Boundary Quantum Groups 


In applications to physical systems that have a 
boundary, the Yang-Baxter equation [1] appears in 
conjunction with the boundary Yang-Baxter equa- 
tion, also known as the reflection equation, 


Ri2(u — v)Ki(u)R21(u + v)K2(v) 
一 K;(v)R12(u 十 v)Ky(u)Ro4(u —v) [23] 


The matrices K are known as reflection matrices. This 
equation was originally introduced by Cherednik to 
describe the reflection of particles from a boundary in 
an integrable scattering theory and was used by 
Sklyanin to construct integrable spin chains and 
quantum field theories with boundaries. 

Boundary quantum groups are certain co-ideal 
subalgebras of affine quantum groups. They provide 
the algebraic structures underlying the solutions of the 
boundary Yang-Baxter equation in the same way in 
which affine quantum groups underlie the solutions of 
the ordinary Yang-Baxter equation. Both allow one 
to find solutions of the respective Yang-Baxter 
equation by solving a linear intertwining relation. In 
the case without spectral parameters these algebras 
appear in the theory of braided groups (see Hopf 
Algebras and q-Deformation Quantum Groups and 
Braided and Modular Tensor Categories). 
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For example, the subalgebra B,(à) of Uj(g) 
generated by 


H; b E: 
Q; = aj" (Ef + Ej) + (ai^ — 1), 

1 = 0 [24] 
is a boundary quantum group for certain choices of 
the parameters e; € C[[P]]. It is a left co-ideal 
subalgebra of U;(à') because 

A(Q) = Qi @1+4;" Q; € U(g) 8 B.(&) [25] 
Intertwiners K(A): Vj, > V,/, for some constant 7 


satisfying 
K(X) pnr(Q) = Pn/(Q)K(A), forall Q € B.(g) [26] 


provide solutions of the reflection equation in the 
form 


(id @ KA (i) R^ Qu) d @ K"(A))R2"(A/p) 
= R(A/p)(id & K' (A) 
x R' (Ap) (id @ K'(u)) [27] 


This can be extended to the case where the 
boundary itself carries a representation W of B,(g). 
The boundary Yang-Baxter equation can be repre- 
sented graphically as 


Another example is provided by twisted Yangians 
where, when the I, and J, are constructed as 
nonlocal charges in sigma models, it is found that 
a boundary condition which preserves integrability 
leaves only the subset 


I; and Jo = Jy 十 Sh piq( Lila + Ili) 


conserved, where i labels the b-indices and p,q the 
t-indices of a symmetric splitting g=h+. The 


algebra Y(g, 5) generated by the 1;, Jp is, like B.(g), 
a co-ideal subalgebra, A(Y(g,5)) C Y(g) & Y(g, b), 
and again yields an intertwining relation for 
K-matrices. For g=sl, and }=so, or sp;,, Y(g, b) 
is the *twisted Yangian" described in Molev (2003). 

All the constructions in earlier sections of this 
review have analogs in the boundary setting. For 
more details see Delius and MacKay (2003) and 
MacKay (2005). 


See also: Bethe Ansatz; Boundary Conformal Field 
Theory; Classical r-Matrices, Lie Bialgebras, and Poisson 
Lie Groups; Hopf Algebras and g-Deformation Quantum 
Groups; Riemann-Hilbert Problem; Solitons and 
Kac—Moody Lie Algebras; Yang-Baxter Equations. 
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Introduction 


In classical electrodynamics, the interaction of charged 
particles with the electromagnetic field is local, 
through the pointlike coupling of the electric charge 
of the particles with the electric and magnetic fields, E 
and B, respectively. This is mathematically expressed 
by the Lorentz-force law. The scalar and vector 
potentials, p and A, which are the time and space 
components of the relativistic 4-potential A,, are 
considered auxiliary quantities in terms of which 
the field strengths E and B, the observables, are 
expressed in a gauge-invariant manner. The homo- 
geneous or first pair of Maxwell equations are a direct 
consequence of the definition of the field strengths in 
terms of A,, The inhomogeneous or second pair of 
Maxwell equations, which involve the charges and 
currents present in the problem, are also usually 
written in terms of E and B; however when writing 
them in terms of A,,, the number of degrees of freedom 
of the electromagnetic field is explicitly reduced from 
six to four; and finally, with two additional gauge 
transformations, one ends with the two physical 
degrees of freedom of the electromagnetic field. 

In quantum mechanics, however, both the 
Schrödinger equation and the path-integral approaches 
for scalar and unpolarized charged particles in the 
presence of electromagnetic fields, are written in 
terms of the potential and not of the field strengths. 
Even in the case of the Schrédinger—Pauli equation 
for spin 1/2 electrons with magnetic moment 4 
interacting with a magnetic field B, one knows that 
the coupling —4- B is the nonrelativistic limit of the 
Dirac equation, which depends on A, but not on E and 
B Since gauge invariance also holds in the quantum 
domain, it was thought that A and y were mere 
auxiliary quantities, like in the classical case. 

Aharonov and Bohm, in 1959, predicted a quan- 
tum interference effect due to the motion of charged 
particles in regions where B(E) vanishes, but not 
A(y), leading to a nonlocal gauge-invariant effect 
depending on the flux of the magnetic field in the 
inaccessible region, in the magnetic case, and on the 
difference of the integrals over time of time-varying 
potentials, in the electric case. (The magnetic effect 
was already noticed 10 years before by Ehrenberg 
and Siday in a paper on the refractive index of 
electrons.) 
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In the context of the Schródinger equation, one 
can show that due to gauge invariance, if Wo is a 
solution to the equation in the absence of an 
electromagnetic potential, then the product of 
wo(x) times the integral of A, over a path joining 
an arbitrary reference point xo to x is also a 
solution, if the integral is path independent. How- 
ever, it is the path integral of Feynman which in the 
formulas for propagators of charged particles in the 
presence of electromagnetic fields clearly shows that 
tbe action of these fields on charged particles is 
nonlocal, and it is given by the celebrated non- 
integrable (path-dependent) phase factor of Wu and 
Yang (1975). Moreover, this fact provides an 
additional proof of the nonlocal character of 
quantum mechanics: to surround fluxes, or to 
develop a potential difference, the particle has to 
travel simultaneously at least through two paths. 

Thus, the fact that the Aharonov-Bohm (A-B) 
effect was verified experimentally, by Chambers and 
others, demonstrates the necessity of introducing the 
(gauge-dependent) potential A, in describing the 
electromagnetic interactions of the quantum parti- 
cle. This is widely regarded as the single most 
important piece of evidence for electromagnetism 
being a gauge theory. Moreover, it shows, to 
paraphrase Yang, that the field underdescribes the 
physical theory, while the potential overdescribes it, 
and it is the phase factor which describes it exactly. 

The content of this article is essentially twofold. 
The first four sections are mainly physical, where we 
describe the magnetic A-B effect using the 
Schródinger equation and the Feynman path inte- 
gral. The fifth section is geometrical and is the long- 
est of the article. We describe the effect in the 
context of fiber bundles and connections, namely 
as a result of the coupling of the wave function 
(section of an associated bundle) to a nontrivial 
flat connection (non-pure gauge vector potential 
with zero magnetic field) in a trivial bundle (the 
A-B bundle) with topologically nontrivial (non- 
simply-connected) base space. We discuss the mod- 
uli space of flat connections and the holonomy 
groups giving the phase shifts of the interference 
patterns. Finally, in the last section, we briefly 
comment on the nonabelian A-B effect. 


Electromagnetic Fields in Classical Physics 


In classical physics, the motion of charged particles 
in the presence of electromagnetic fields is governed 
by the equation 
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“p= q(E x B) [1] 
where 
i mv 
p= 1 — (w/e?) 


is the mechanical momentum of the particle with 
electric charge q, mass m, and velocity v = x (c is 
the velocity of light in vacuum, and for |v| < c the 
left-hand side (LHS) of [1] is approximately mv); the 
right-hand side (RHS) is the Lorentz force, where E 
and B are, respectively, the electric and magnetic 
fields at the spacetime point (t,x) where the particle 
is located. Equation [1] is easily derived from the 
Euler-Lagrange equation 


d /OL OL 

i 2 
with the Lagrangian L given by the sum of the free 
Lagrangian for the particle, 


Lg = -mej i- [3] 


and the Lagrangian describing the particle-field 
interaction, 


Lint = $A-v— qe M 


In [4], A and ọ are, respectively, the vector potential 
and the scalar potential, which together form the 
4-potential A, = (Ao, —A) = (%, — A), i = 1,2,3, 
in terms of which the electric and magnetic field 
strengths are given by 


10 
B=VxA [Sb] 


The classical action corresponding to a given path of 
the particle is 


to t2 
(e / dt L = | dt(Lo + Lint) 
a ti " . 
= / dt Lo + J dt Line = So + Sint [6] 
ty ty 


E, B, and S are invariant under the gauge 
transformation 


A—A'—A- VÀ [7a] 


1 
imb dial Ptah [7b] 


where A is a real-valued differentiable scalar 
function (at least of class C?) on spacetime. That 
is, if E', B’, and S! , are defined in terms of A’ and 
y' as E, B, and Sint are defined in terms of A and 
p, then E = E, B= B, and Si, = Si. This fact 
leads to the concept that, classically, the observa- 
bles E and B are the physical quantities, while A, 
is only an auxiliary quantity. Also, and most 
important in the present context, eqn [1] states 
that the motion of the particles is determined by 
the values or state of the field strengths in an 
infinitesimal neighborhood of the particles, that is, 
classically, E and B act locally. If one defines the 
differential 1-form A = A,dx^ (with dx? = cdt), 
then the components of the differential 2-form 
F = dA = (1720,A, — 0,A,)dx" A dx" = (1/2)R,, 
dx” ^ dx" are precisely the electric and magnetic 


fields: 


-E> Bg 0  —B! 
-E> =p Bg! 0 
At the level of A, 
dF = d*A = 0 [9] 


is an identity, but at the level of E and B, [9] 
amounts to the homogeneous (or first pair of) 
Maxwell equations obeyed by the field strengths: 


V.B-0 [10a] 


VxECiÉg.o 
c Ot 


[10b| 
Therefore, these equations have a geometrical 
origin. The second pair of Maxwell equations is 
dynamical, and is obtained from the field action (in 
the Heaviside system of units) 


1 4 m 
Sfield e -54 xF,F [11] 
which leads to 
V- E = 4p [12a] 
ET PAE Pi 12b] 
c Ot C 


where (p, —j) = (j°, —j) is the 4-current satisfying, as a 
consequence of [12a] and [12b], the conservation law 


a,j" — 0 [13] 


For a pointlike particle, p(t,x) = qó?(x — x(t)) and 
j = pv. 


Electromagnetic Fields in Quantum 
Physics 


In quantum physics, the motion of charged particles in 
external electromagnetic fields is governed by the 
Schródinger equation or, equivalently, by the Feynman 
path integral. In both cases, however, it is the 
4-potential A,, which appears in the equations, and 
not the field strengths. For simplicity, we consider here 
scalar (spinless) charged particles or unpolarized 
electrons (spin-(1/2)particles), both of which, in the 
nonrelativistic approximation, can be described quan- 
tum mechanically by a complex wave function v(t, x). 

To derive the Schródinger equation, one starts 
from the classical Hamiltonian 


1 q,\2 
H-P.v-L-me =5(P-2A) +q [14] 
where 
0 q 
Pa leapt 
Ov di" 


is the canonical momentum of the particle, and we 
have subtracted its rest energy. The replacements 
P 一 —ibV and H 一 ib0/Ot lead to 


ib -y= (= (nv+24) *ae)o 
p 2a 
("zv 


h ib 
ue a+ TA. V+ae)¥ [15] 


q’ 2 
2mc? iu^ 


*2m 


The gauge transformation [7a] and [7b] is a 
symmetry of this equation, if simultaneously to the 
change of the 4-potential, the wave function trans- 
forms as follows: 


e—(iq/bc)A w(t, x) [7c] 
e üd/bc)A 


v(t,x) — y (t,x) = 
So, A’ and v/ obey [15]. At each (t,x), 
belongs to U(1), the unit circle in the complex plane. 
In the  path-integral approach, the kernel 
K(t',x’;t,x), which gives the probability, amplitude 
for the propagation of the particle from the spacetime 
point (t,x) to the spacetime point (£^, x') (t< t), is 
given by 
Kx Ex) 


‘X(t )=x 
= | Dx(T)exp (; (So + Sin) 


(=< 


api " Mu 
=f Dx(rjexp( 5 / dr( 5m 
+4A-v—qe)) 
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= [T pa exo; I dr je) 


us (A-dx - eds?) 
-f E Dxirjexp( 5 [or mä?) 
x(t)=x 
x exp( 5 f d^A,) [16] 


where the integral f Dx(r)... is over all continuous 
spacetime paths (7, x(7)) which join (t, x) with (¢’, x’). 
If one knows the wave function at (t,x), then the 
wave function at (7, x’) is given by 


vt) = J EE 


An important point is the natural appearance in the 
integrand of the functional integral of the factor 


elia/he) | A 


for each path y joining (t,x) with (', x'). 


A Solution to the Schródinger Equation 


In what follows, we shall restrict ourselves to static 
magnetic fields; then in the previous formulas, we 
set o = 0 and A(t,x) = A(x). It is then easy to 
show that if xo is an arbitrary reference point and 
the integral Ja A(x") - dx’ is independent of the 
integration path from xo to x, that is, it is a well- 
defined function f of x, and if Wo is a solution of 
the free Schrödinger equation, that is, 


2 


= b Pup [18] 


PE 
iT m 2m 


then 


Wa) = exp( 3E [ AG) de use) 9 


is a solution of [15]. In fact, replacing [19] in [15], 
the LHS gives 
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The cancelation of the exponential factors shows 
that, under the condition of path independence, 
there is no effect of the potential on the charged 
particles. Another way to see this is by making a 
gauge transformation [7a]-[7c] with A(x) = f(x), 
which changes — y and A—A'—A-V 
[ A(x)- dy =A-A=0. 

he condition of path independence amounts, 
however, to the condition that no magnetic field is 
present since, if f. A depends on ^, then for some 
pair of paths y and y from (t,x) to (t,x), OF f. 
A- [,A- f A+ f A= f A= f da-(V x A) 
where in the last equality we applied Stokes theorem 
(€ is any surface with boundary *4U(—^/)), which 
shows that. B — V x A must not vanish everywhere 
and has a nonzero flux through X given by 


b= | do-B [20] 


The conclusion of this section is that the ansatz [19] for 
solving [15] can only be applied in simply connected 
regions with no magnetic field strength present. 


Aharonov-Bohm Proposal 


In 1959, Aharonov and Bohm proposed an experi- 
ment to test, in quantum mechanics, the coupling of 
electric charges to electromagnetic field strengths 
through a local interaction with the electromagnetic 
potential A,, but not with the field strengths 
themselves. However, as we saw before, no physical 
effect exists, that is, A, can be gauged away, unless 
magnetic and/or electric fields exist somewhere, 
although not necessarily overlapping the wave func- 
tion of the particles. 

Consider the usual two-slit experiment as depicted 
in Figure 1, with the additional presence, behind the 
slits, of a long and narrow solenoid enclosing a 
nonvanishing magnetic flux ® due to a constant and 
homogeneous magnetic field B normal to the plane 


Figure 1 Magnetic Aharonov-Bohm effect. 


of the figure (in direction z); outside of the solenoid, 
the magnetic field is zero. If the radius of the 
solenoid is R, a vector potential A that produces 
such field strength is given by 


r>R [21] 


(o/2zr)g, 
where ® = zR?|B| and $ is a unit vector in the 
azimuthal direction. In fact, 


|IBlz, r<R 


0 r>R 22] 


B= Y x A(x) = d 
Notice that at r — R, A is continuous but not 
continuously differentiable. Also, the ideal limit of 
an infinitely long solenoid makes the problem two- 
dimensional, that is, in the x-y plane. 

The probability amplitude for an electron emitted 
at the source S to arrive at the point P on the screen 
II, is given by the sum of two probability ampli- 
tudes, namely those corresponding to passing 
through the slits 1 and 2. The solenoid is assumed 
to be impenetrable to the electrons; mathematically, 
this corresponds to a motion in a non-simply- 
connected region. In the approximation for the 
path integral [16], in which one considers the 
contribution of only two classes of paths, that is, 
the class (4) represented by path I, and the class 
{y} represented by path II, if the wave function at 
the source is ys, then the wave function at P is 
given by 


Wp = (| eli/b)So(7)  lel/ be) Í. A 
{7} 


:/b)So (7) .-(ilel/be) f, A 
n | qm Glel/be) Jy E 
iY 


— e™lilel/he) f, A 
Ji 


e lel/ho Ji^ | 
{Y} 
— e-(ilel/se) f, A (vto 
+4 (Jaen Ba) 


» e lilel/he) f, A (vt) X eser 0) 


elt/h)So(M) ah. 


[23] 
where, in the second line, we used the path 


independence of the integral of A within each class 
of paths; 


i/h So 
Vo) = " UP fa SOD yp 


and 


uem = J Canik 
y 


and, in the last equality, we applied the extended 
version of Stokes theorem (by Craven), to allow for 
noncontinuously differentiable vector potentials; 
and the quantum of magnetic flux associated with 
the charge |e| is defined by 


bo = in ~4135x107Gem? [24] 


(=27/|e| = Vr/a = V1377 in the natural system 
of units (n.s.u.) 6 = c = 1; o is the fine structure 
constant). Then the probability of finding the 
electron at P is proportional to 


Iel? = |p)? + [Up (I |" 
+ 2Re(e 9» uo mugam) [2.5] 


which exhibits an interference pattern shifted with 
respect to that without the magnetic field: as B and 
therefore ® change, dark and bright interference 
fringes alternate periodically at the screen II, with 
period Bo. This is the magnetic A-B effect, which has 
been quantitatively verified in many experiments, the 
first one in 1960 by Chambers. The effect is: 


1. gauge invariant, since B and therefore ® are 
gauge invariant; 

2. nonlocal, since it depends on the magnetic field 
inside the solenoid, where the electrons never 
enter; 

3. quantum mechanical, since classically the charges 
do not feel any force and therefore no effect 
would be expected in this limit; and 

4. topological, since the electrons necessarily move 
in a non-simply-connected space. 


But perhaps the most important implication of the 
A—B effect is a dramatic additional confirmation of 
the nonlocal character of quantum mechanics: the 
electron has to “travel” along the two paths (I and 
II) simultaneously; on the contrary, no flux would 
be surrounded and then no shift of the (then 
nonexistent) interference fringes would be observed 
at the screen II. 

Calculations in the path-integral approach includ- 
ing the whole set of homotopy classes of paths 
around the solenoid, indexed by an integer m, have 
been performed by several authors, leading to a 
formula of the type 


pe = Soe vln) [26] 
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with 
中 


= 2T— 2 
6 [27] 


(Schulman 1971, Kobe 1979). As in [23], 
Ap (b + ko) = yp(®), REZ [28] 


There is a close relation between the A-B effect 
and the Dirac quantization condition (DQC) in the 
presence of electric and magnetic charges: according 
to [25] (or [26]) the A-B effect disappears when the 
flux ® equals npo = 2zn(bc/lel, n € Z, that is, 
when the condition 


lel® = nhc [29] 


holds. But this is the DQC (Dirac 1931) when 9 is 
the flux associated with a magnetic charge g: 
(g) = (g/4nr?) x 4n? = g, leading to |elg = nhc 
(271 in the n.s.u.). This is precisely the condition for 
the Dirac string to be unobservable in quantum 
mechanics: to give no A-B effect. 


Geometry of the A-B Effect 


In this section we study the space of gauge classes of 
flat potentials outside the solenoid, which determine 
the A-B effect; the topological structure of the A-B 
bundle; and the holonomy groups of the connec- 
tions, which precisely give the phase shifts of the 
wave functions. We use the n.s.u. system; in parti- 
cular, if [L] is the unit of length, then [A,] = [L] !, 
[lel] = [L]°, and $9 = 2z/le| = J/z/o 2: 4/1377, where 
a is the fine structure constant. 

To synthesize, one can say that the abelian A-B 
effect is a nonlocal gauge-invariant quantum effect 
due to the coupling of the wave function (section of 
an associated bundle) to a nontrivial (non-exact) flat 
(closed) connection in a trivial principal bundle with 
a non-simply-connected base space. In the following 
subsections, we will give a detailed explanation of 
these statements. 


The A-B Bundle 


The gauge group of electromagnetism is the abelian 
Lie group U(1) with Lie algebra (the tangent space at 
the identity) u(1) = iR. In the limit of an infinitely 
long and infinitesimally thin solenoid carrying the 
magnetic flux ®, the space available to the electrons 
is the plane minus a point, that is, R?*, which is of 
the same homotopy type as the circle $t. Then the 
set of isomorphism classes of U(1) bundles over Ra 
is in one-to-one correspondence with the set of 
homotopy classes of maps from S° to S! (Steenrod 
1951), which consists of only one point: if f,g: 
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S° = S! are given by f(1)—e', f(—1) =e'”, 
g(1) = e", and g(—1) = e'*, then H : S? x [0,1] ^ 
S! given by H(1,t) = e((0ü-961*65) and H(—1,1) = 
ei(17092*10) is a homotopy between f and g. Then, 
up to equivalence, the relevant bundle for the A-B 
effect is the product bundle 

£4. g : U(1) > R^ x U(1) > R^ [30a] 
Since R** is homeomorphic to an open disk minus a 
point (D2)', then the total space of the bundle is 
homeomorphic to an open solid 2-torus minus a 
circle, since (T2)! = (Dy x S'. Then the A-B 
bundle has the topological structure 


£5 2 S' — (T2) — (D5)' [30b] 


The Gauge Group and the Moduli Space of Flat 
Connections 


The gauge group of the bundle €,_, is the set of 
smooth functions from the base space to the 
structure group, that is, G — C*(R?*, U(1)). Since 
G c C(R^.,U(1) = {continuous functions R= 
(1)} and [R?*, U(1)] = {homotopy classes of contin- 
uous functions R?* — U(1)} = [S!, S$!] = m (S!) S 
Z, given f € G there exists a unique n € Z such 
that f is homotopic to falf ~ fa), where f, : R^ > 
U(1) is given by falre) = e"*, i» € [0, 27). 

G acts on the space of flat connections on £4 s 
given by the closed z(1)-valued differential 1-forms 
on R?*: 


Co = {A € Q'(R^;u(1),dA = 0} [31] 
through 

CoxGoC, (Af)—A-f-'df [32] 
where f^! (x, y) = (f(x, y)) !. The moduli space 


C i 
Mo = T = {gauge equivalence classes 


of flat connections on a-g} 
={[A]={A+f "df, feg).AeCo) [33] 
is isomorphic to the circle S! with length 1. This can 
be seen as follows: the de Rham cohomology of R** 
with coefficients in iR in dimension 1 is 
Hpg (R^; iR) = (A[Ao]pa; à € R} 
= HLS(SiR)e&R [34] 


where 
€ Co [35] 


is the connection that, once multiplied by —|e| ' (see 
below) generates the flux —4 and therefore no 
A-B effect: Ag is closed (dAp — 0) but not 
exact ((xdy —ydx)/(x? +y?) = dp only for ye 
(0,27), p = 0 is excluded); [.Ao]pg = Ao + d8 with 
B € Q?(R?'; iR). 8 gives an element of G through the 
composite exp o 3: R^* — U(1), (x, y) — eM), The 
A-B effect with flux ® = — \®p is produced by the 
connection A = Ap. To determine Mo, one finds 
the smallest c € R such that (A + 6).Ao ^ A.Ao, that is, 
(A+ 0)Ao € [A.Ao], which means, from [33], that 
(A+o)Ao = AA + f !df or o.Ao = f tdf. For y Æ 
0, Ao = idy and f,'df, = idy, then c= 1, and 
therefore (入 十 1)Ap ~ A.Ao, in particular Ap ~ 0. 

A remark concerning the gauge group © is the 
following. In classical electrodynamics, according to 
[7a] and [7b], the symmetry group could be taken to 
be the additive group (R, 十 ) instead of the multi- 
plicative group U(1). Since R is contractible, then 
the gauge group would be G4 = C*(R^', R) with 
[R?*, R] = 0, so that the homomorphism V : Gq > G, 
V(f) (x) = eye) would not exhaust G since V(f) € [1] 
for any f € G4: in fact, H : R^ x [0,1] — U(1) 
given by H(x,t) = é- is a homotopy between 
V(f) and 1. However, the quantization of electric 
charges implies that in fact the gauge group is U(1) 
and not R. This is equivalent mathematically to the 
possible existence of magnetic monopoles which 
require nontrivial bundles for their description. 


Covariant Derivative, Parallel Transport, 
and Holonomy 


Let G be a matrix Lie group with Lie algebra g, B a 
differentiable manifold, £:G — P&B a principal 
bundle, V a vector space, G x V — V an action, 
and £y: V — P xg V ZB the corresponding asso- 
ciated vector bundle (£y is trivial if £ is trivial). Call 
l'(£y) the sections of £y, l'(TB)(T(TP)) the sections 
of the tangent bundle of B(P), and Tea(P, V) the set 
of functions y: P — V satisfying (pg) = g (p) 
(equivariant functions from P to V). sec TY(£y) 
induces 7 € (P, V) with 4,(p) =v, where 
s(r(p)) = [p, v] and y € Lea(P, V) induces s, € T(£v) 
with s,(b) = [p,^(p)], where p € x '({b}). If H isa 
connection on £, that is, a smooth assignment of a 
(horizontal) vector subspace H, of TyP at each p of 
P, algebraically determined by a smooth g-valued 
1-form w on P through H, = ker(w,), s € I'(£v), 
X € (TB), and X! € [(TP) the horizontal lifting of 
X by w, then X'(4,) € (P, V), and covariant 


derivative of s with respect to w in the direction of X is 


defined by 


VS? = SX1(y) [36a] 


If à: 7 (U) — Ux Gis a local trivialization of £, 
x^, p = 1,..., dim B are local coordinates on U, and 
ej, i = 1,...,dim V is a basis of the local sections in 


my'(U), then the local expression of [36a] is 


; O | 
V XhàJox» (sei) — E (5 ‘Ax H T A) se; [36b] 


where 


Au, = A dx" = (o*wu); [36c] 
is the geometrical gauge potential in U, given by the 
pullback of wy, the ER of w to m '(U), by the 
local section o:U — «^! (U), o(b) = $^! (b, 1). (Al. 


pi 
is defined through V7 Jaxn ei = A’ .e;.) The operator 


ui* 


p ed. a. 


Ju i Axl! jal 


[36d] 


is the usual local covariant derivative. In an over- 
lapping trivialization, [36b] is replaced by 


Wy jt ; ð re 
V xna/ax (s'e; )-X (s x AL) s" e 


!'Ox p pt 


with e; = ge, and s" = g lis ! on UN U', then the 
local potential transforms as 


A = =g Aig i+ + (ugh)g i 


which for G abelian has the form [32]. 

For each smooth path c:[0, 1] — B joining the 
points b and b’, and each p € Pj = 7 '({b}), there 
exists a unique path c! in P through p with c(t) € 
H, for all t € [0, 1]. c! is the horizontal lifting of c 
by w through p. Thus, for each connection and path 
there exists a diffeomorphism P7:P, — Py called 
parallel transport. If c is a loop at b, then PY € 
Diff(P,) is called the holonomy of w at b along c. To 
the loop space of B at b, O(B;b), corresponds a 
subgroup Hol; of Diff(P,) called the holonomy of w 
at b. If c € O(B;b) and £ is a lifting of c through 4 € 
Py, then there exists a unique path g:[0, 1] —^ G 
such that c!(t) = G(t)g(t) with c!(0) = qg(0) = p; g 
satisfies the differential equation 


[36e] 


© g(t) + og (Alt) = 0 37 
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whose solution is the time-ordered exponential 


g(t)g(0) ! = rexp( n dr d) 
0 


i Me 
过 
3 
om 
o 
^ 
E 
3 
si 


X | disais ym )) ^: 


«fo dro, y (BTim)) [38] 


If g =p then g(0) = 1. For each p € P, the set of 
elements g € G such that c'(1) — pg! for ce 
Q(B;x(p)) is a subgroup of G, Hol,, called the 
holonomy of w at p. (For each p, there exists a 
group isomorphism Hol, 一 Holz) and if p and p’ 
are connected by a horizontal curve, then 
Hol; = Holy; if all p's in P are horizontally con- 
eor? then Hol, = G for all p € P.) If (U,¢) is a 
local trivialization of £, c C U, and (t) = o(c(t)), then 
one has the local formula 


| c(t) 
c'(t) = $^ (c(t), 1) (rew( s |, Av) Jeto 


[39] 


In particular, if £ is a product bundle, then ó is the 
identity, and choosing g(0) — 1 gives 


c(t) 
c! (t) = (cin, Te - L. 4) [40] 


In our case, V = C, € is a product bundle, s = v, 
the wave function, is a global section of the 
associated bundle 


£c: C — R” x CSR” [41] 


G = U(1) with g = iR and an action U(1) x C > C, 
(e,z) e'?z; therefore, A, = Ao, = ia, with a, 
real valued, and the covariant derivative is 


ð 
Duy = e + 2 p 


If v» carries the electric charge q, we define the 
physical gauge potential A,, through 


ay = qÅ, [42] 


and, for the covariant derivative, after multiplying 
by i, we obtain the operator appearing in eqn [15], 
iD, = (i(0/Ox") — qA,)v: in fact, for the spatial 
part the coupling is (iV 4-4A)v, and for the 
temporal part one has (i90/Ot —qq)v. For the 
electron, q = — |e| and a, = —|e|A,, = —(27/99)A 


[36f] 
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For c€ QR?" :(xo, yo)), which turns m times 
around the solenoid at (0,0), eqn [40] gives 
= ((xo, o), e" ^) = (xo, yo), e" f^) 
Ee (lo, yo), e PEA = ((xo, yo), e ?7i"?/9o) 


and therefore, for ®/®o = A € [0, 1) we have the 
holonomy groups 


Hol yo).1 ee Z 
T Liq; A= p/q, p,q E Z, (p.q) = 
Z, AQ 


[43] 


In the second case, Hol; o D is dense in U( 1): in fact, 
suppose that for 11,72 € Zi, n1 Æ nz, emà = erim, 
then em 772^ — 1 and so (nı —2)\ = m for some 
m € Z; therefore, A € Q, which is a contradiction. 

Finally, we should mention that the A-B effect 
can be understood as a geometric phase à la Berry, 
though not necessarily through an adiabatic change 
of the parameters on which the Hamiltonian 
depends. The Berry potential ag turns out to be 
proportional to the real magnetic vector potential A: 
in the n.s.u., and for electrons, 


ap = 一 le|A [44] 


Nonabelian and Gravitational A-B Effects 


Since the fundamental group II; (R?*, (xo, yo)) = Z, 
eqn [43] shows that there is a homomorphism (w): 

I1; (RÊ > (x0, Yo)) 一 MU plw)(n) = e ""^, with 
p(w) (L(R*)) = Holgy yo), 1)» which characterizes 
the A-B effect in that case. In general, an A-B 
effect in a G-bundle with a connection w is 
characterized by a group homomorphism from the 
fundamental group of the base space B onto the 
holonomy group of the connection, which is a 
subgroup of the structure group. The A-B effect is 
nonabelian if the holonomy group is nonabelian, 
which requires both G and IH(B,x) to be 


nonabelian. Examples with Yang-Mills and grav- 
itational fields are considered in the literature. 
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Introduction 


Quantum field theory may be understood as the 
incorporation of the principle of locality, which is at 
the basis of classical field theory, into quantum 


physics. There are, however, severe obstacles against 
a straightforward translation of concepts of classical 
field theory into quantum theory, among them the 
notorious divergences of quantum field theory and 
the intrinsic nonlocality of quantum physics. There- 
fore, the concept of locality is somewhat obscured in 
the formalism of quantum field theory as it is 
typically exposed in textbooks. Nonlocal concepts 
such as the vacuum, the notion of particles or the S- 
matrix play a fundamental role, and neither the 


relation to classical field theory nor the influence of 
background fields can be properly treated. 

Algebraic quantum field theory (AQFT; synony- 
mously, local quantum physics), on the contrary, 
aims at emphasizing the concept of locality at every 
instance. As the nonlocal features of quantum 
physics occur at the level of states (“entangle- 
ment"), not at the level of observables, it is better 
not to base the theory on the Hilbert space of states 
but on the algebra of observables. Subsystems of a 
given system then simply correspond to subalgebras 
of a given algebra. The locality concept is abstractly 
encoded in a notion of independence of subsystems; 
two subsystems are independent if the algebra of 
observables which they generate is isomorphic 
to the tensor product of the algebras of the 
subsystems. 

Spacetime can then - in the spirit of Leibniz — be 
considered as an ordering device for systems. So, one 
associates with regions of spacetime the algebras of 
observables which can be measured in the pertinent 
region, with the condition that the algebras of 
subregions of a given region can be identified with 
subalgebras of the algebra of the region. 

Problems arise if one aims at a generally covariant 
approach in the spirit of general relativity. Then, in 
order to avoid pitfalls like in the *hole problem," 
systems corresponding to isometric regions must be 
isomorphic. Since isomorphic regions may be 
embedded into different spacetimes, this amounts 
to a simultaneous treatment of all spacetimes of a 
suitable class. We will see that category theory 
furnishes such a description, where the objects are 
the systems and the morphisms the embeddings of a 
system as a subsystem of other systems. 

States arise as secondary objects via Hilbert space 
representations, or directly as linear functionals on 
the algebras of observables which can be interpreted 
as expectation values and are, therefore, positive 
and normalized. It is crucial that inequivalent 
representations (“sectors”) «can occur, and the 
analysis of the structure of the sectors is one of 
the big successes of AQFT. One can also study the 
particle interpretation of certain states as well as 
(equilibrium and nonequilibrium) thermodynamical 
properties. 

The mathematical methods in AQFT are mainly 
taken from the theory of operator algebras, a field of 
mathematics which developed in close contact to 
mathematical physics, in particular to AQFT. 
Unfortunately, the most important field theories, 
from the point of view of elementary particle 
physics, as quantum electrodynamics or the standard 
model could not yet be constructed beyond formal 
perturbation theory with the annoying consequence 
that it seemed that the concepts of AQFT could not 
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be applied to them. However, it has recently been 
shown that formal perturbation theory can be 
reshaped in the spirit of AQFT such that the algebras 
of observables of these models can be constructed as 
algebras of formal power series of Hilbert space 
operators. The price to pay is that the deep 
mathematics of operator algebras cannot be applied, 
but the crucial features of the algebraic approach can 
be used. 

AQFT was originally proposed by Haag as a 
concept by which scattering of particles can be 
understood as a consequence of the principle of 
locality. It was then put into a mathematically 
precise form by Araki, Haag, and Kastler. After the 
analysis of particle scattering by Haag and Ruelle 
and the clarification of the relation to the Lehmann- 
Symanzik-Zimmermann (LSZ) formalism by Hepp, 
the structure of superselection sectors was studied 
first by Borchers and then in a fundamental series of 
papers by Doplicher, Haag, and Roberts (DHR) 
(see, e.g., Doplicher et al. (1971, 1974)) (soon after 
Buchholz and Fredenhagen established the relation 
to particles), and finally Doplicher and Roberts 
uncovered the structure of superselection sectors as 
the dual of a compact group thereby generalizing the 
Tannaka-Krein theorem of characterization of 
group duals. 

With the advent of two-dimensional conformal 
field theory, new models were constructed and it was 
shown that the DHR analysis can be generalized to 
these models. Directly related to conformal theories is 
the algebraic approach to holography in anti-de Sitter 
(AdS) spacetime by Rehren. 

The general framework of AQFT may be described 
as a covariant functor between two categories. The 
first one contains the information on local relations 
and is crucial for the interpretation. Its objects are 
topological spaces with additional structures (typi- 
cally globally hyperbolic Lorentzian spaces, possibly 
spin bundles with connections, etc.), its morphisms 
being the structure-preserving embeddings. In the 
case of globally hyperbolic Lorentzian spacetimes, 
one requires that the embeddings are isometric and 
preserve the causal structure. The second category 
describes the algebraic structure of observables. In 
quantum physics the standard assumption is that one 
deals with the category of C*-algebras where the 
morphisms are unital embeddings. In classical phys- 
ics, one looks instead at Poisson algebras, and in 
perturbative quantum field theory one admits alge- 
bras which possess nontrivial representations as 
formal power series of Hilbert space operators. It is 
the leading principle of AQFT that the functor .of 
contains all physical information. In particular, two 
theories are equivalent if the corresponding functors 
are naturally equivalent. 
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In the analysis of the functor .o/, a crucial role is 
played by natural transformations from other 
functors on the locality category. For instance, a 
field A may be defined as a natural transformation 
from the category of test function spaces to the 
category of observable algebras via their functors 
related to the locality category. 


Quantum Field Theories as Covariant 
Functors 


The rigorous implementation of the generally covariant 
locality principle uses the language of category theory. 
The following two categories are used: 


Loc: The class of objects obj(Loc) is formed by all 
(smooth) d-dimensional (d 2 is held fixed), 
globally hyperbolic Lorentzian spacetimes M 
which are oriented and time oriented. Given any 
two such objects Mı and M5, the morphisms v € 
homis,(M4, M2) are taken to be the isometric 
embeddings v»: M; — M2 of M, into M; but with 
the following constraints: 


(i) if y:[a,b] —^ M; is any causal curve and 
^((a),»y(b) € (M1) then the whole curve must 
be in the image v'(M;), that is, y(t) € v(M1) for 
all t € [a, b]; 

(ii) any morphism preserves orientation and 
time orientation of the embedded spacetime. 
The composition is defined as the composition 
of maps, the unit element in homyoc(M, M) is 
given by the identical embedding idm : M —> M 
for any M € obj(Loc). 


Obs: The class of objects obj(Obs) is formed by all 
C*-algebras possessing unit elements, and the 
morphisms are faithful (injective) unit-preserving 
*-homomorphisms. The composition is again 
defined as the composition of maps, the unit 
element in homops(.A, A) is for any A € obj(Obs) 
given by the identical map id 4: A — A,A c A. 


The categories are chosen for definitiveness. One 
may envisage changes according to particular needs, 
as, for instance, in perturbation theory where instead 
of C*-algebras general topological »-algebras are 
better suited. Or one may use von Neumann 
algebras, in case particular states are selected. On 
the other hand, one might consider for Loc bundles 
over spacetimes, or (in conformally invariant the- 
ories) admit conformal embeddings as morphisms. In 
case one is interested in spacetimes which are not 
globally hyperbolic, one could look at the globally 
hyperbolic subregions (where one needs to be careful 
about the causal convexity condition (i) above). 


The concept of locally covariant quantum field 
theory is defined as follows. 


Definition 1 
(1) A locally covariant quantum field theory is a 


covariant functor .% from Loc to Obs and (writing 
au for .o(y)) with the covariance properties 


Ot O Ay = A'o Od, = id (wi) 


for all morphisms v € homyoc(M,, M5), all 
morphisms W% € homyoc(M2,M3), and all 
M € obj(Loc). 

(ii) A locally covariant quantum field theory 
described by a covariant functor 7 is called 
“causal” if the following holds: whenever there 
are morphisms v; € homi« (Mj, M),j — 1,2, 
so that the sets /i (M1) and (M2) are causally 
separated in M, then one has 


[ow Co (M1)), o, (A (M2))] = {0} 


where the element-wise commutation makes 
sense in ./(M). 

(iii) One says that a locally covariant quantum field 
theory given by the functor .o/ obeys the “time- 
slice axiom” if 


ayl A (M)) = of (M!) 


holds for all w € homyo.(M, M’) such that (M) 
contains a Cauchy surface for M’. 


Thus, a quantum field theory is an assignment of 
C*-algebras to (all) globally hyperbolic spacetimes 
so that the algebras are identifiable when the 
spacetimes are isometric, in the indicated way. This 
is a precise description of the generally covariant 
locality principle. 


The Traditional Approach 


The traditional framework of AQFT, in the Araki- 
Haag-Kastler sense, on a fixed globally hyperbolic 
spacetime can be recovered from a locally covariant 
quantum field theory, that is, from a covariant 
functor 7 with the properties listed above. 

Indeed, let M be an object in obj(Loc). K(M) 
denotes the set of all open subsets in M which are 
relatively compact and also contain, with each pair 
of points x and y, all g-causal curves in M 
connecting x and y (cf. condition (i) in the definition 
of Loc). O € K(M), endowed with the metric of M 
restricted to O and with the induced orientation and 
time orientation, is a member of obj(Loc), and the 
injection map ty.o:O — M, that is, the identical 
map restricted to O, is an element in homioc(O, M). 


With this notation, it is easy to prove the following 
assertion: 


Theorem 1 Let «/ be a covariant functor witb 
the above-stated properties, and define a map 
K(M) 3 O= A(O) C (M) by setting 


A(O) = mol (O)) 
Then the following statements bold: 
(i) The map fulfills isotony, that is, 


O; C O5 > A(O4) C A(O2) 
for all O4, O3 € K(M) 


(u) If tbere exists a group G of isometric diffeo- 
morphisms &:M — M (so that & * g =g) preser- 
ving orientation and time orientation, tben tbere 
is a representation G 25 ke à, of G by C'- 
algebra automorphisms | à, : (M) — (M) 
sucb tbat 


a«(A(O)) = A(K(O)), O € K(M) 
(iii) If the theory given by .«/ is additionally causal, 
then it holds that 


[A(O1), A(O2)] = {0} 


for all O,,O2 € K(M) with O1 causally sepa- 
rated from Op. 


These properties are just the basic assumptions of 
the Araki-Haag-Kastler framework. 


The Achievements of the Traditional 
Approach 


In the Araki-Haag-Kastler approach in Minkowski 
spacetime M, many results have been obtained in 
the last 40 years, some of them also becoming a 
source of inspiration to mathematics. A description 
of the achievements can be organized in terms of a 
length-scale basis, from the small to the large. We 
assume in this section that the algebra . (M) is 
faithfully and irreducibly represented on a Hilbert 
space H, that the Poincaré transformations are 
unitarily implemented with positive energy, and 
that the subspace of Poincaré invariant vectors is 
one dimensional (uniqueness of the vacuum). 
Moreover, algebras correponding to regions which 
are spacelike to a nonempty open region are 
assumed to be weakly closed (i.e., von Neumann 
algebras on H), and the condition of weak 
additivity is fulfilled, that is, for all O € K(M) 
the algebra generated from the algebras 
A(O + x),x € M is weakly dense in .o/(M). 
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Ultraviolet Structure and Idealized Localizations 


This section deals with the problem of inspecting the 
theory at very small scales. In the limiting case, one 
is interested in idealized localizations, eventually the 
points of spacetimes. But the observable algebras are 
trivial at any point x € M, namely 


() A(O) =C1, OeK(M) 


Osx 


Hence, pointlike localized observables are neces- 
sarily singular. Actually, the Wightman formulation 
of quantum field theory is based on the use of 
distributions on spacetime with values in the algebra 
of observables (as a topological »-algebra). In spite 
of technical complications whose physical signifi- 
cance is unclear, this formalism is well suited for a 
discussion. of the connection with the Euclidean 
theory, which allows, in fortunate cases, a treatment 
by path integrals; it is more directly related to 
models and admits, via the operator-product expan- 
sion, a study of the short-distance behavior. It is, 
therefore, an important question how the algebraic 
approach is related to the Wightman formalism. The 
reader is referred to the literature for exploring the 
results on this relation. 

Whereas these results point to an essential equiva- 
lence of both formalisms, one needs in addition a 
criterion for the existence of sufficiently many Wight- 
man fields associated with a given local net. Such a 
criterion can be given in terms of a compactness 
condition to be discussed in the next subsection. As a 
benefit, one derives an operator-product expansion 
which has to be assumed in the Wightman approach. 

In the purely algebraic approach, the ultraviolet 
structure has been investigated by Buchholz and 
Verch. Small-scale properties of theories are studied 
with the help of the so-called scaling algebras whose 
elements can be described as orbits of observables 
under all possible renormalization group motions. 
There results a classification of theories in the scaling 
limit which can be grouped into three broad classes: 
theories for which the scaling limit is purely classical 
(commutative algebras), those for which the limit is 
essentially unique (stable ultraviolet fixed point) and 
not classical, and those for which this is not the case 
(unstable ultraviolet fixed point). This classification 
does not rely on perturbation expansions. It allows 
an intrinsic definition of confinement in terms of the 
so-called ultraparticles, that is, particles which are 
visible only in the scaling limit. 


Phase-Space Analysis 


As far as finite distances are concerned, there are 
two apparently competing principles, those of 
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nuclearity and modularity. The first one suggests 
that locally, after a cutoff in energy, one has a 
situation similar to that of old quantum mechanics, 
namely a finite number of states in a finite volume 
of phase space. Aiming at a precise formulation, 
Haag and Swieca introduced their notion of com- 
pactness, which Buchholz and Wichmann sharpened 
into that of nuclearity. The latter authors proposed 
that the set generated from the vacuum vector €), 


(e PH AQ | A € A(O), |A|| < 1} 


H denoting the generator of time translations 
(Hamiltonian), is nuclear for any 9-0, roughly 
stating that it is contained in the image of the unit 
ball under a trace class operator. The nuclear size 
Z(8,0) of the set plays the role of the partition 
function of the model and has to satisfy certain 
bounds in the parameter 3. The consequence of this 
constraint is the existence of product states, namely 
those normal states for which observables localized in 
two given spacelike separated regions are uncorre- 
lated. A further consequence is the existence of 
thermal equilibrium states (KMS states) for all 9 > 0. 

The second principle concerns the fact that, even 
locally, quantum field theory has infinitely many 
degrees of freedom. This becomes visible in the 
Reeh-Schlieder theorem, which states that every 
vector ® which is in the range of e^?" for some 
B » 0 (in particular, the vacuum Q) is cyclic and 
separating for the algebras .A(O), O € K(M), that is, 
A(O)® is dense in H (4 is cyclic) and Ağ =0, A € 
A(O) implies A=0 (® is separating). The pair 
(A(O),Q) is then a von Neumann algebra in the 
so-called standard form. On such a pair, the 
Tomita-Takesaki theory can be applied, namely 
the densely defined operator 


SAQ = A'Q, Ae A(O) 


is closable, and the polar decomposition of its 
closure $—JA/? delivers an antiunitary involution 
J (the modular conjugation) and a positive self- 
adjoint operator A (the modular operator) asso- 
ciated with the standard pair (A(O), Q). These 
operators have the properties 


JA(O)J = A(O) 
where the prime denotes the commutant, and 
A" A(O)A-“* = A(O), tER 


The importance of this structure is based on the 
fact disclosed by Bisognano and Wichmann using 
Poincaré-covariant Wightman fields and local alge- 
bras generated by them, that for specific regions in 
Minkowski spacetime the modular operators have a 


geometrical meaning. Indeed, these authors showed 
for the pair (A(W),Q), where W denotes the wedge 
region W={x € M||x°| « x!), that the associated 
modular unitary A" is the Lorentz boost with velocity 
tanh(27t) in the direction 1 and that the modular 
conjugation / is the CPT symmetry operator with 
parity P, the reflection with respect to the x! —0 
plane. Later, Borchers discovered that already on the 
purely algebraic level a corresponding structure exists. 
He proved that, given any standard pair (A, 6) and a 
one-parameter group of unitaries 7 — U(7) acting on 
the Hilbert space H with a positive generator and 
such that ® is invariant and U(7T).AU(T)' C A,r > 0, 
then the associated modular operators A and J fulfill 
the commutation relations 


A"U(r)A^" = U(e 7) 
JU(r)J = U(-7) 


which are just the commutation relations between 
boosts and lightlike translations. 

Surprisingly, there is a direct connection between 
the two concepts of nuclearity and modularity. 
Indeed, in the nuclearity condition, it is possible to 
replace the Hamiltonian operator by a specific 
function of the modular operator associated with a 
slightly larger region. Furthermore, under mild 
conditions, nuclearity and modularity together 
determine the structure of local algebras completely; 
they are isomorphic to the unique hyperfinite type 
III; von Neumann algebra. 


Sectors, Symmetries, Statistics, and Particles 


Large scales are appropriate for discussing global 
issues like superselection sectors, statistics and 
symmetries as far as large spacelike distances are 
concerned, and scattering theory, with the resulting 
notions of particles and infraparticles, as far as large 
timelike distances are concerned. 

In purely massive theories, where the vacuum 
sector has a mass gap and the mass shell of the 
particles are isolated, a very satisfactory description 
of the multiparticle structure at large times can be 
given. Using the concept of almost local particle 
generators, 


v = A(t)Q 


where V is a single-particle state (1.e., an eigenstate 
of the mass operator), A(t) is a family of almost 
local operators essentially localized in the kinema- 
tical region accessible from a given point by a 
motion with the velocities contained in the spectrum 
of V, one obtains the multiparticle states as limits of 
products Aj,(t)---A,(t)Q for disjoint velocity sup- 
ports. The corresponding closed subspaces are 
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invariant under Poincaré transformations and are 
unitarily equivalent to the Fock spaces of noninter- 
acting particles. 

For massless particles, no almost-local particle 
generators can be expected to exist. In even 
dimensions, however, one can exploit Huygens 
principle to construct asymptotic particle generators 
which are in the commutant of the algebra of the 
forward or backward lightcone, respectively. Again, 
their products can be determined and multiparticle 
states obtained. 

Much less well understood is the case of massive 
particles in a theory which also possesses massless 
particles. Here, in general, the corresponding states 
are not eigenstates of the mass operator. Since 
quantum electrodynamics (QED) as well as the 
standard model of elementary particles have this 
problem, the correct treatment of scattering in these 
models is still under discussion. One attempt to a 
correct treatment is based on the concept of the so- 
called particle weights, that is, unbounded positive 
functionals on a suitable algebra. This algebra is 
generated by positive almost-local operators annihi- 
lating the vacuum and interpreted as counters. 

The structure at large spacelike scales may be 
analyzed by the theory of superselection sectors. The 
best-understood case is that of locally generated 
sectors which are the objects of the DHR theory. 
Starting from a distinguished representation ro 
(vacuum representation) which is assumed to fulfill 
the Haag duality, 


m™(A(O)) = so (A(O^))' 

for all double cones O, one may look at all 
representations which are equivalent to the vacuum 
representation if restricted to the observables loca- 
lized in double cones in the spacelike complement of 
a given double cone. Such representations give rise 
to endomorphisms of the algebra of observables, 
and the product of endomorphisms can be inter- 
preted as a product of sectors (“fusion”). In general, 
these representations violate the Haag duality, but 
there is a subclass of the so-called finite statistics 
sectors where the violation of Haag duality is small, 
in the sense that the nontrivial inclusion 


t(A(O)) C m(A(O'))’ 


has a finite Jones index. These sectors form (in at least 
three spacetime dimensions) a symmetric tensor 
category with some further properties which can be 
identified, in a generalization of the Tannaka—Krein 
theorem, as the dual of a unique compact group. This 
group plays the role of a global gauge group. The 
symmetry of the category is expressed in terms of a 


representation of the symmetric group. One may then 
enlarge the algebra of observables and obtain an 
algebra of operators which transform covariantly 
under the global gauge group and satisfy Bose or 
Fermi commutation relations for spacelike separation. 

In two spacetime dimensions, one obtains instead 
braided tensor categories. They have been classified 
under additional conditions (conformal symmetry, 
central charge c « 1) in a remarkable work by 
Kawahigashi and Longo. Moreover, in their paper, 
one finds that by using completely new methods (Q- 
systems) a new model is unveiled, apparently 
inaccessible by methods used by others. To some 
extent, these categories can be interpreted as duals 
of generalized quantum groups. 

The question arises whether all representations 
describing elementary particles are, in the massive 
case, DHR representations. One can show that in the 
case of a representation with an isolated mass shell 
there is an associated vacuum representation which 
becomes equivalent to the particle representation after 
restriction to observables localized spacelike to a given 
infinitely extended. spacelike cone. This property is 
weaker than the DHR condition but allows, in four 
spacetime dimensions, the same construction of a 
global gauge group and of covariant fields with Bose 
and Fermi commutation relations, respectively, as the 
DHR condition. In three space dimensions, however, 
one finds a braided tensor category, which has similar 
properties as those known from topological field 
theories in three dimensions. 

The sector structure in massless theories is not 
well understood, due to the infrared problem. This is 
in particular true for QED. 


Fields as Natural Transformations 


In order to be able to interpret the theory in terms of 
measurements, one has to be able to compare 
observables associated with different regions of 
spacetime, or, even different spacetimes. In the 
absence of nontrivial isometries, such a comparison 
can be made in terms of locally covariant fields. By 
definition, these are natural transformations from 
the functor of quantum field theory to another 
functor on the category of spacetimes Loc. 

The standard case is the functor which associates 
with every spacetime M its space D(M) of smooth 
compactly supported test functions. There, the 
morphisms are the pushforwards Dy) = w,. 


Definition 2 A locally covariant quantum field 6 is 
a natural transformation between the functors 2 
and .o/, that is, for any object M in obj(Loc) there 
exists a morphism 9j : D(M) — . (M) such that for 
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any pair of objects M, and M» and any morphism v 
between them, the following diagram commutes: 


Py, 


D(Mi) — A(M)) 
Ws | | o4 
D(Mj) y A(M2) 


The commutativity of the diagram means, expli- 
citly, that 


ay 0 PM = Pu, ov, 


which is the requirement sought for the covariance 
of fields. It contains, in particular, the standard 
covariance condition for spacetime isometries. 

Fields in the above sense are not necessarily linear. 
Examples for fields which are also linear are the scalar 
massive free Klein-Gordon fields on all globally 
hyperbolic spacetimes and its locally covariant Wick 
polynomials. In particular, the energy-momentum 
tensors can be constructed as locally covariant fields, 
and they provide a crucial tool for discussing the back- 
reaction problem for matter fields. 

An example for the more general notion of a field 
are the local S-matrices in the Stückelberg-Bogolubov- 
Epstein-Glaser sense. These are unitaries Sy(A) with 
M € obj(Loc) and A€D(M) which satisfy the 
conditions 


Sm(0) = 1 
Sm(A+ u +v) =Sm(A + u)Su(u) Sm (e+ v) 


for A, u, v € D(M) such that the supports of A and v 
can be separated by a Cauchy surface of M with 
supp À in the future of the surface. 

The importance of these S-matrices relies on the 
fact that they can be used to define a new quantum 
field theory. The new theory is locally covariant if the 
original theory is and if the local S-matrices satisfy 
the condition of the locally covariant field above. A 
perturbative construction of interacting quantum 
field theory on globally hyperbolic spacetimes was 
completed in this way by Hollands and Wald, based 
on previous work by Brunetti and Fredenhagen. 
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Synopsis 


Anomalies are the breaking of classical symmetries by 
quantum mechanical radiative corrections, which arise 
when the regularizations needed to evaluate small 
fermion loop Feynman diagrams conflict with a 
classical symmetry of the theory. They have important 
implications for a wide range of issues in quantum 
field theory, mathematical physics, and string theory. 


Chiral Anomalies, Abelian 
and Nonabelian 


Consider quantum electrodynamics, with the fer- 
mionic Lagrangian density 


L = liy" ð, — eg B, — mo) [1a] 


where Y —v/^4?, eo and mo are the bare charge and 
mass, and B, is the electromagnetic gauge potential. 
(We reserve the notation A for axial-vector quan- 
tities.) Under a chiral transformation 


V [1b] 


with constant A, the kinetic term in eqn [la] is 
invariant (because ys commutes with 4°”), whereas 
the mass term is not invariant. Therefore, naive 
application of Noether’s theorem would lead one to 
expect that the axial-vector current 


y — elAT 


jn = Vus [1c] 


obtained from the Lagrangian density by applying a 
chiral transformation with spatially varying A, should 
have a divergence given by the change under chiral 
transformation of the mass term in eqn [1a]. Up to 
tree approximation, this is indeed true, but when one 
computes the AVV Feynman diagram with one axial- 
vector and two vector vertices (see Figure 1), and 
insists on conservation of the vector current 
Jj, — Vy,V, one finds that to order e2, the classical 
Noether theorem is modified to read 


2 
eg 


167? 


Qj (x) = Zimoj? (x) 十 F'(x)F'(x)e,., [2] 


V V 


Figure 1 The AVV triangle diagram responsible for the abelian 
chiral anomaly. 


with F*?(x) — 9" B$ (x) — 9*B^(x) the electromagnetic 
field strength tensor. The second term in eqn [2], 
which would be unexpected from the application of 
the classical Noether theorem, is the abelian axial- 
vector anomaly (often called the Adler-Bell-Jackiw 
(or ABJ) anomaly after the seminal papers on the 
subject). Since vector current conservation, together 
with the axial-vector current anomaly, implies that 
the left- and right-handed chiral currents j,, + N are 
also anomalous, the axial-vector anomaly is fre- 
quently called the “chiral anomaly,” and we shall 
use the terms interchangeably in this article. 

There are a number of different ways to understand 
why the extra term in eqn [2] appears. (1) Working 
through the formal Feynman diagrammatic Ward 
identity proof of the Noether theorem, one finds that 
there is a step where the closed fermion loop contribu- 
tions are eliminated by a shift of the loop-integration 
variable. For Feynman diagrams that are convergent, 
this is not a problem, but the AVV diagram is linearly 
divergent. The linear divergence vanishes under sym- 
metric integration, but the shift then produces a finite 
residue, which gives the anomaly. (2) If one defines the 
AVV diagram by Pauli-Villars regularization with 
regulator mass Mo that is allowed to approach infinity 
at the end of the calculation, one finds a classical 
Noether theorem in the regulated theory, 

O js, — O9 FnlMy = 2imor|,, —2iMof|u, — [3a] 


jt = 


with the subscripts mo and Mo indicating that 
fermion loops are to be calculated with fermion 
mass 719 and Mo, respectively. Taking the vacuum 
to two-photon matrix element of eqn [3a], one finds 
that the matrix element (O| |m, lyy), which is 
unambiguously computable after imposing vector- 
current conservation, falls off only as Mj! as the 
regulator mass approaches infinity. Thus, the 
product of 2iMo with this matrix element has a 
finite limit, which gives the anomaly. (3) If the 
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gauge-invariant axial-vector current is defined by 
point-splitting 

j (x) = W(x + €/2)yyswlx —e/2)e 9€ — [3b] 
with €e — 0 at the end of the calculation, one 
observes that the divergence of eqn [3b] contains 
an extra term with a factor of e. On careful 
evaluation, one finds that the coefficient of this 
factor is an expression that behaves as ec !, which 
gives the anomaly in the limit of vanishing c. (4) 
Finally, if the field theory is defined by a functional 
integral over the classical action, the standard 
Noether analysis shows that the classical action is 
invariant under the chiral transformation of eqn 
[1b], apart from the contribution of the mass term, 
which gives the naive axial-vector divergence. How- 
ever, as pointed out by Fujikawa, the chiral 
transformation must also be applied to the func- 
tional integration measure, and since the measure is 
an infinite product, it must be regularized to be well 
defined. Careful calculation shows that the regular- 
ized measure is not chiral invariant, but contributes 
an extra term to the axial-vector Ward identity that 
is precisely the chiral anomaly. 

A key feature of the anomaly is that it is 
irreducible: a local polynomial counter term cannot 
be added to the AVV diagram that preserves 
vector-current conservation and eliminates the 
anomaly. More generally, one can show that there 
is no way of modifying quantum electrodynamics 
so as to eliminate the chiral anomaly, without 
spoiling either vector-current conservation (i.e., 
electromagnetic gauge invariance), renormalizabil- 
ity, or unitarity. Thus, the chiral anomaly is a new 
physical effect in renormalizable quantum field 
theory, which is not present in the prequantization 
classical theory. 

The abelian chiral anomaly is the simplest case of 
the anomaly phenomenon. It was extended to 
nonabelian gauge theories by Bardeen using a 
point-splitting method to compute the divergence, 
followed by adding polynomial counter terms to 
remove as many of the residual terms as possible. 
The resulting irreducible divergence is the nonabe- 
lian chiral anomaly, which in terms of Yang-Mills 
field strengths for vector and axial-vector gauge 
potentials V^ and A", 


Fy (x) = 0" V"(x) — 0" V" (x) — i[V"(x), V^(x)] 

- i[A^ (x), A (v) - 
FA (x) = OVA" (x) — 0" A" (x) — i[V"(x), A" (x)] 

— i[A" (x), V" (x) 
Is given by 


O"jS (x) = normal divergence term 
(1/47^)e,,o tr [(1/4) P (x) FY (x) 
Aer )Pa (x) 

+ (ZI DJA” (xA (x) Fy (x) 
+ (2/3)iFy (x)A" (x)A" (x) 
+ (2/3)1A" (x) Fy (x)A" (x) 
— (8/3)A" (x)A"(x)A"(x)A'(x)] ^ [4b] 
In eqn [4b], “tr” denotes a trace over internal 
degrees of freedom, and M, is the internal symmetry 
matrix associated with the axial-vector external 
field. In the abelian case, where there is no internal 
symmetry structure, the terms involving two or four 
factors of A", A",... vanish by antisymmetry of 

Envors and one recovers the AVV triangle anomaly, 

as well as a kinematically related anomaly in the 

AAA triangle diagram. In the nonabelian case, with 

nontrivial internal symmetry structure, there are also 

box- and pentagon-diagram anomalies. 

In addition to coupling to spin-1 gauge fields, 
fermions can also couple to spin-2 gauge fields, 
associated with the graviton. When the coupling of 
fermions to gravitation is taken into account, the 
axial-vector current T^,ysv, with T an internal 
symmetry matrix, has an additional anomalous 
contribution to its divergence proportional to 


tr Tecor R PR? ug [4c] 


where Rzj,, is the Riemann curvature tensor of the 
gravitational field. 


Chiral Anomaly Nonrenormalization 


A salient feature of the chiral anomaly is the fact 
that it is not renormalized by higher-order radia- 
tive corrections. In other words, the one-loop 
expressions of eqns [2] and [4b] give the exact 
anomaly coefficient without modification in higher 
orders of perturbation theory. In gauge theories 
such as quantum electrodynamics and quantum 
chromodynamics, this result (the Adler-Bardeen 
theorem) can be understood heuristically as fol- 
lows. Write down a modified Lagrangian, in 
which regulators are included for all gauge-boson 
fields. Since the gauge-boson regulators do not 
influence the chiral-symmetry properties of the 
theory, the divergences of the chiral currents are 
not affected by their inclusion, and so the only 
sources of anomalies in the regularized theory are 
small single-fermion loops, giving the anomaly 
expressions. of eqns [2] and [4b]. Since the 
renormalized theory is obtained as the limit of 


the regularized theory as the regulator masses 
approach infinity, this result applies to the 
renormalized theory as well. 

The above argument can be made precise, and 
extends to nongauge theories such as the o-model as 
well. For both gauge theories and the o-model, 
cancellation of radiative corrections to the anomaly 
coefficient has been explicitly demonstrated in 
fourth-order calculations. Nonperturbative demon- 
strations of anomaly renormalization have also been 
given using the Callan-Symanzik equations. For 
example, in quantum electrodynamics, Zee, and 
Lowenstein and Schroer, showed that a factor f 
that gives the ratio of the true anomaly to its one- 
loop value obeys the differential equation 


CALO z) 7 0 [5] 


Since f is dimensionless, it can have no dependence 
on the mass m, and since (o) is nonzero this implies 
Of [Oa — 0. Thus, f has no dependence on a, and so 


f=, 


Applications of Chiral Anomalies 


Chiral anomalies have numerous applications in the 
standard model of particle physics and its exten- 
sions, and we describe here a few of the most 
important ones. 


Neutral Pion Decay 7° — yy 


As a result of the abelian chiral anomaly, the 
partially conserved axial-vector current (PCAC) 
equation relevant to neutral pion decay is modified 
to read 


OF 3,,(x) 
= (frp2/V2 ) dal ós(x x) T sT F(x x)F'?(x)€¢arp [6al 


with js, the pion mass, f. ~ 131MeV the charged- 
pion decay constant, and $ a constant determined 
by the constituent fermion charges and axial-vector 
couplings. Taking the matrix element of eqn [6a] 
between the vacuum state and a two-photon state, 
and using the fact that the left-hand side has a 
kinematic zero (the Sutherland-Veltman theorem), 
one sees that the 7? — yy amplitude F is comple- 
tely determined by the anomaly term, giving the 
formula 


F = —(a/n)2Sv2/f, [6b] 


For a single set of fractionally charged quarks, the 
amplitude F is a factor of three too small to agree 
with experiment; for three fractionally charged 
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quarks (or an equivalent Han-Nambu triplet), eqn 
[6b] gives the correct neutral pion decay rate. This 
calculation was one of the first pieces of evidence for 
the color degree of freedom of quarks. 


Anomaly Cancellation in Gauge Theories 


In quantum electrodynamics, the gauge particle (the 
photon) couples to the vector current, and so the 
anomalous conservation properties of the axial- 
vector current have no effect. The same statement 
holds for the gauge gluons in quantum chromody- 
namics, when treated in isolation from the other 
interactions. However, in the electroweak theory 
that embeds quantum electrodynamics in a theory of 
the weak force, the gauge particles (the W* and Z 
intermediate bosons) couple to chiral currents, 
which are left- or right-handed linear combinations 
of the vector and axial-vector currents. In this case, 
the chiral anomaly leads to problems with the 
renormalizability of the theory, unless the anomalies 
cancel between different fermion species. Writing all 
fermions as left-handed, the condition for anomaly 
cancellation is 


te Tas T3) T., 一 tr(T,, Ts T Tele) l= = 
for all a, 8,7 [7] 


with T, the coupling matrices of gauge bosons to 
left-handed fermions. These conditions are obeyed 
in the standard model, by virtue of three nontrivial 
sum rules on the fermion gauge couplings being 
satisfied (four sum rules, if one includes the 
gravitational contribution to the chiral anomaly 
given in eqn [4c], which also cancels in the standard 
model). Note that anomaly cancellation in the 
locally gauged currents of the standard model does 
not imply anomaly cancellation in global-flavor 
currents. Thus, the flavor axial-vector current 
anomaly that gives the 7° — yy matrix element 
remains anomalous in the full electroweak theory. 
Anomaly cancellation imposes important constraints 
on the construction of grand unified models that 
combine the electroweak theory with quantum 
chromodynamics. For instance, in SU(5) the fer- 
mions are put into a 5 and 10 representation, which 
together, but not individually, are anomaly free. The 
larger unification groups SO(10) and Ee satisfy eqn 
[7] for all representations, and so are automatically 
anomaly free. 


Instanton Physics and the Theta Vacuum 


The theory of anomalies is intimately tied to the 
physics associated. with instanton classical Yang- 
Mills theory solutions. Since the instanton field 
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strength is self-dual, the nonvanishing instanton 
Euclidean action 


1 
Sr = J fs; E.P" — 8? [8a] 


implies that the integral of the pseudoscalar density 
F,,F45€"^" over the instanton is also nonzero, 


f d'xF,, Py” = 6472 [gb] 


Referring back to eqn [4b], this means that the 
integral of the nonabelian chiral anomaly for 
fermions in the background field of an instanton is 
an integer, which in the Minkowski space continua- 
tion has the interpretation of a topological winding 
number change produced by the instanton tunneling 
solution. This fact has a number of profound 
consequences. Since a vacuum with a definite wind- 
ing number |v) is unstable under instanton tunnel- 
ing, careful analysis shows that the nonabelian 
vacuum that has correct clustering properties is a 
Fourier superposition 


9) = elv) [8c] 


giving rise to the 0-vacuum of quantum chromody- 
namics, and a host of issues associated with (the lack 
of) strong CP violation, the Peccei-Quinn mecha- 
nism, and axion physics. Also, the fact that the 
integral of eqn [8b] is nonzero means that the U(1) 
chiral symmetry of quantum chromodynamics is 
broken by instantons, which as shown by ’t Hooft 
resolves the longstanding “U(1) problem” of strong 
interactions, that of explaining why the flavor 
singlet pseudoscalar meson 7 is not light, unlike its 
flavor octet partners. 


Anomaly Matching Conditions 


The anomaly structure of a theory, as shown by ’t 
Hooft, leads to important constraints on the forma- 
tion of massless composite bound states. Consider a 
theory with a set of left-handed fermions v, with i a 
“color” index acted on by a nonabelian gauge force, 
and f an ungauged family or *flavor" index. Suppose 
that the family multiplet structure is such that the 
global chiral symmetries associated with the flavor 
index have nonvanishing anomalies tr{T,, T3}Ty. 
Then the ’t Hooft condition asserts that if the color 
forces result in the formation of composite massless 
bound states of the original completely confined 
fermions, and if there is no spontaneous breaking of 
the original global flavor symmetries, then these 
bound states must contain left-handed spin-1/2 
composites with a representation structure § that 


has the same anomaly coefficient as that in the 
underlying theory. In other words, we must have 


tr{S., Sa YS» = ef Ta, T3) T., [9] 


To prove this, one adjoins to the theory a set of 
right-handed spectator fermions «/ with the same 
flavor structure as the original set, but which are not 
acted on by the color force. These right-handed 
fermions cancel the original anomaly, making the 
underlying theory anomaly free at zero color 
coupling; since dynamics cannot spontaneously 
generate anomalies, the theory, when the color 
dynamics is turned on, must also have no global 
chiral anomalies. This. implies that the bound-state 
spectrum must conspire to cancel the anomalies 
associated with the right-handed spectators; in other 
words, the bound-state anomaly structure must 
match that of the original fermions. This anomaly 
matching condition has found applications in the 
study of the possible compositeness of quarks and 
leptons. It has also been applied to the derivation of 
nonperturbative dynamical results in whole classes 
of supersymmetric theories, where the combined 
tools of holomorphicity, instanton physics, and 
anomaly matching have given incisive results. 


Global Structure of Anomalies 


We noted earlier that chiral anomalies are irreduci- 
ble, in that they cannot be eliminated by adding a 
local polynomial counter-term to the action. How- 
ever, anomalies can be described by a nonlocal 
effective action, obtained by integrating out the 
fermion field dynamics, and this point of view proves 
very useful in the nonabelian case. Starting with the 
abelian case for orientation, we note that if A” is an 
external axial-vector field, and we write an effective 
action [A], then the axial-vector current [o asso- 
ciated with A" is given (up to an overall constant) by 
the variational derivative expression 
6T [A] 


jx) — 


BA! (x) dm 


and the abelian anomaly appears as the fact that the 
expression 


OT = XTA =G #0, X= E A [10b] 


6A" (x) 


is nonvanishing even when the theory is classically 
chiral invariant. Turning now to the nonabelian 
case, the variational derivative appearing in eqns 
[10a] and [10b] must be replaced by an appropriate 


covariant derivative. In terms of the internal- 
symmetry component fields A7 and Vý of the 
Yang-Mills potentials of eqn [4a], one introduces 


Operators 


_X4(x) = P asi fa, VÀ XE 
+ fabc A? Pci 
z 5 ju [11a] 
ae (x) =O" SV) + fabe Vj: BV) 
+ fabcA? He 


with fabe the antisymmetric nonabelian group struc- 
ture constants. The operators X^ and Y° are easily 
seen to obey the commutation relations 


[X^ (x), X^ (y)] = fanc5(x — y) Y«(x) 
[X" (x), Y^(y)] = fapcó(x — y)X«(x) 
[Y^ fx), Y^(y)] = fabcó(x y)Y«(x) 


[11b] 


Let T[V, A] be the effective action as a functional of 
the fields V^, A", constructed so that the vector 
currents are covariantly conserved, as expressed 
formally by 


YTV, A] =0 [12a] 
Then the nonabelian axial-vector current anomaly is 
given by 


XT[V, A] = G? [12b] 


From eqns [12a] and [12b] and the first line of 
eqn [11b], we have 
x^g2 - x^qG^ = (XX = X" XP)T'(V, A] 


x fabe YT[V, A] = 0 [12c] 


which is the Wess-Zumino consistency condition on 
the structure of the anomaly G^. It can be shown 
that this condition uniquely fixes the form of the 
nonabelian anomaly to be that of eqn [4b], up to an 
overall constant, which can be determined by 
comparison with the simplest anomalous AVV 
triangle graph. A physical consequence of the 
consistency condition is that the 7° — yy decay 
amplitude determines uniquely certain other anom- 
alous amplitudes, such as 2y — 32,7 — 37, and a 
five pseudoscalar vertex. 

Although the action l'[V, A] is necessarily non- 
local, Wess and Zumino were able to write down a 
local action, involving an auxiliary pseudoscalar 
field, that obeys the anomalous Ward identities and 
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the consistency conditions. Subsequently, Witten 
gave a new construction of this local action, in 
terms of the integral of a fifth-rank antisymmetric 
tensor over a five-dimensional disk which has a 
four-dimensional space as its boundary. He also 
showed that requiring e" to be independent of the 
choice of the spanning disk requires, in analogy with 
Dirac’s quantization condition for monopole charge, 
the condition that the overall coefficient in the 
nonabelian anomaly be quantized in integer multi- 
ples. Comparison with the lowest-order triangle 
diagram shows that in the case of SU(N.) gauge 
theory, this integer is just the number of colors Ne. 
Thus, global considerations tightly constrain the 
nonabelian chiral anomaly structure, and dictate 
that up to an integer-proportionality constant, it 
must have the form given in eqns [4a] and [4b]. 


Trace Anomalies 


The discovery of chiral anomalies inspired the search 
for other examples. of anomalous behavior. First 
indications of a perturbative trace anomaly obtained 
in a study of broken scale invariance by Coleman and 
Jackiw were shown by Crewther, and by Chanowitz 
and Ellis, to correspond to an anomaly in the three- 
point function 07V, V,, where 0" is the energy- 
momentum tensor. Letting A,,(p) be the momentum 
space expression for this three-point function, and IL, 
the corresponding V,, V, two-point function, the trace 
anomaly equation in quantum electrodynamics reads 


ð 
Ap) = (2 — Po 5.) IL (p) 


R 
g= (PuPv E Nw) [13a] 


with the first term on the right-hand side the naive 
divergence, and the second term the trace anomaly, 
with anomaly coefficient R given by 


R- Y +75 O? 


i,spind i,spin 0 


[13b] 


The fact that there should be a trace anomaly can 
readily be inferred from a trace analog of the Pauli- 
Villars regulator argument for the chiral anomaly 
given in eqn [3a]. Letting j=w be the scalar 
current in abelian electrodynamics, one has 


BP nia a 0 | Mo = MOA | no Moj\m, [13c] 


Taking the vacuum to two-photon matrix element 

of this equation, and imposing vector-current con- 

servation, one finds that the matrix element 
d à . =, Py 

(Oll lV) is proportional to Mj (0|FAsF" |y). 

for a large regulator mass, and so makes a 
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nonvanishing contribution to the right-hand side of 
eqn [13c], giving the lowest-order trace anomaly. 
Unlike the chiral anomaly, the trace anomaly is 
renormalized in higher orders of perturbation 
theory; heuristically, the reason is that whereas 
boson field regulators do not affect the chiral 
symmetry properties of a gauge theory (which are 
determined just by the fermionic terms in the 
Lagrangian), they do alter the energy-momentum 
tensor, since gravitation couples to all fields, includ- 
ing regulator fields. An analysis using the Callan- 
Symanzik equations shows, however, that the trace 
anomaly is computable to all orders in terms of 
various renormalization group functions of the 
coupling. For example, in abelian electrodynamics, 
defining la) and 6(a) by B(a)=(m/a)da/Om and 
1 + óla) = (m/mo)Omo/Om, the trace of the energy- 
momentum tensor is given to all orders by 


6: = [1 + 6(c)| mod + 3ó(a)N(FF"] --- [14] 


with N[ ] specifying conditions that make the division 
into two terms in eqn [14] unique, and with the 
ellipsis --- indicating terms that vanish by the equa- 
tions of motion. A similar relation holds in the 
nonabelian case, again with the 8 function appearing 
as the coefficient of the anomalous tr N[F4, F^] term. 

Just as in the chiral anomaly case, when spin-0, 
spin-1/2, or spin-1 fields propagate on a background 
spacetime, there are curvature-dependent contribu- 
tions to the trace anomaly, in other words, gravita- 
tional anomalies. These typically take the form of 
complicated linear combinations of terms of the 
form R?, R,,R"", R4, R", R p”, with coefficients 
depending on the matter fields involved. 

In supersymmetric theories, the axial-vector current 
and the energy-momentum tensor are both 
components of the supercurrent, and so their anoma- 
lies imply the existence of corresponding supercurrent 
anomalies. The issue of how the nonrenormalization 
of chiral anomalies (which have a supercurrent 
generalization given by the Konishi anomaly), and 
the renormalization of trace anomalies, can coexist in 
supersymmetric theories originally engendered con- 
siderable confusion. This apparent puzzle is now 
understood in the context of a perturbatively exact 
expression for the @ function in supersymmetric field 
theories (the so-called NSVZ, for Novikov, Shifman, 
Vainshtein, and Zakharov, 8 function). Supersymme- 
try anomalies can be used to infer the structure of 
effective actions in supersymmetric theories, and these 
in turn have important implications for possibilities 
for dynamical supersymmetry breaking. Anomalies 
may also play a role, through anomaly mediation, in 
communicating supersymmetry breaking in “hidden 


sectors” of a theory, which do not contain the physical 
fields that we directly observe, to the “physical sector” 
containing the observed fields. 


Further Anomaly Topics 


The above discussion has focused on some of the 
principal features and applications of anomalies. 
There are further topics of interest in the physics and 
mathematics of anomalies that are discussed in 
detail in the references cited in the “Further reading” 
section. We briefly describe a few of them here. 


Anomalies in Other Spacetime Dimensions 
and in String Theory 


The focus above has been on anomalies in four- 
dimensional spacetime, but anomalies of various 
types occur both in lower-dimensional quantum 
field theories (such as theories in two- and three- 
dimensional spacetimes) and in quantum field the- 
ories in higher-dimensional spacetimes (such as N = 1 
supergravity in ten-dimensional spacetime). Anoma- 
lies also play an important role in the formulation 
and consistency of string theory. The bosonic string is 
consistent only in 26-dimensional spacetime, and the 
analogous supersymmetric string only in ten-dimen- 
sional spacetime, because in other dimensions both 
these theories violate Lorentz invariance after quanti- 
zation. In the Polyakov path-integral formulation of 
these string theories, these special dimensions are 
associated with the cancellation of the Weyl anomaly, 
which is the relevant form of the trace anomaly 
discussed above. Yang-Mills, gravitational, and 
mixed Yang-Mills gravitational anomalies make an 
appearance both in N=1 ten-dimensional super- 
gravity and in superstring theory, and again special 
dimensions play a role. In these theories, only when 
the associated internal symmetry groups are either 
SO(32) or Eg x Eg is elimination of all anomalies 
possible, by cancellation of hexagon-diagram anoma- 
lies with anomalous tree diagrams involving 
exchange of a massless antisymmetric two-form 
field. This mechanism, due to Green and Schwarz, 
requires the factorization of a sixth-order trace 
invariant that appears in the hexagon anomaly in 
terms of lower-order invariants, as well as two 
numerical conditions on the adjoint representation 
generator structure, restricting the allowed gauge 
groups to the two noted above. 


Covariant versus Consistent Anomalies; 
Descent Equations 


The nonabelian anomaly of eqns [4a] and [4b] is 
called the *consistent anomaly," because it obeys the 


Wess-Zumino consistency conditions of eqn [12c]. 
This anomaly, however, is not gauge covariant, as can 
be seen from the fact that it involves not only the 
Yang-Mills field strengths Fi, 4, but the potentials 
V", A" as well. It turns out to be possible, by adding 
appropriate polynomials to the currents, to transform 
the consistent anomaly to a form, called the *covariant 
anomaly," which is gauge covariant under gauge 
transformations of the potentials V", A“. This anom- 
aly, however, does not obey the Wess-Zumino 
consistency conditions, and cannot be obtained from 
variation of an effective action functional. 

The consistent anomalies (but not the covariant 
anomalies) obey a remarkable set of relations, called 
the Stora-Zumino descent equations, which relate 
the abelian anomaly in 2” 十 2 spacetime dimensions 
to the nonabelian anomaly in 2” spacetime dimen- 
sions. This set of equations has been interpreted 
physically by Callan and Harvey as reflecting the 
fact that the Dirac equation has chiral zero modes in 
the presence of strings in 27 + 2 dimensions and of 
domain walls in 2m + 1 dimensions. 


Anomalies and Fermion Doubling in Lattice 
Gauge Theories 


A longstanding problem in lattice formulations of 
gauge field theories is that when fermions are 
introduced on the lattice, the process of discretization 
introduces an undesirable doubling of the fermion 
particle modes. In particular, when an attempt is made 
to put chiral gauge theories, such as the electroweak 
theory, on the lattice, one finds that the doublers 
eliminate the chiral anomalies, by cancellation between 
modes with positive and negative axial-vector charge. 
Thus, for a long time, it appeared doubtful whether 
chiral gauge theories could be simulated on the lattice. 
However, recent work has led to formulations of lattice 
fermions that use a mathematical analog of a domain 
wall to successfully incorporate chiral fermions and the 
chiral anomaly into lattice gauge theory calculations. 


Relation of Anomalies to the Atiyah-Singer 
Index Theorem 


The singlet (A4 — 1) anomaly of eqn [4b] is closely 
related to the Atiyah-Singer index theorem. Specifi- 
cally, the Euclidean spacetime integral of the singlet 
anomaly constructed from a gauge field can be 
shown to give the index of the related Dirac 
operator for a fermion moving in that background 
gauge field, where the index is defined as the 
difference between the numbers of right- and left- 
handed zero-eigenvalue normalizable solutions of 
the Dirac equation. Since the index is a topological 
invariant, this again implies that the Euclidean 
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spacetime integral of the anomaly is a topological 
invariant, as noted above in our discussion of 
instanton-related applications of anomalies. 


Retrospect 


The wide range of implications of anomalies has 
surprised — even astonished — the founders of the 
subject. New anomaly applications have appeared 
within the last few years, and very likely the future 
will see continued growth of the area of quantum 
field theory concerned with the physics and mathe- 
matics of anomalies. 
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introduction 


The central objective in the study of quantum chaos 
is to characterize universal properties of quantum 
systems that reflect the regular or chaotic features of 
the underlying classical dynamics. Most develop- 
ments of the past 25 years have been influenced by 
the pioneering models on statistical properties of 
eigenstates (Berry 1977) and energy levels (Berry 
and Tabor 1977, Bohigas et al. 1984). Arithmetic 
quantum chaos (AQC) refers to the investigation of 
quantum systems with additional arithmetic struc- 
tures that allow a significantly more extensive 
analysis than is generally possible. On the other 
hand, the special number-theoretic features also 
render these systems nongeneric, and thus some of 
the expected universal phenomena fail to emerge. 
Important examples of such systems include the 
modular surface and linear automorphisms of tori 
(“cat maps") which will be described below. 

The geodesic motion of a point particle on a 
compact Riemannian surface M of constant nega- 
tive curvature is the prime example of an Anosov 
flow, one of the strongest characterizations of 
dynamical chaos. The corresponding quantum 
eigenstates i; and energy levels A; are given by the 
solution of the eigenvalue problem for the Laplace- 
Beltrami operator A (or Laplacian for short) 


(A+ A) — 0, lellrze = 1 [1] 
where the eigenvalues 
Àg—0«2A,4€2€---— oo [2] 
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form a discrete spectrum with an asymptotic density 
governed by Weyl’s law 


BU AREE M. Wee, fü 
4m 
We rescale the sequence by setting 
Area(T\H) 
= me [4] 


which yields a sequence of asymptotic density 1. 
One of the central conjectures in AQC says that, if 
M is an arithmetic hyperbolic surface (see the next 
section for examples of this very special class of 
surfaces of constant negative curvature), the eigen- 
values of the Laplacian have the same local 
statistical properties as independent random vari- 
ables from a Poisson process (see, e.g., the surveys by 
Sarnak (1995) and Bogomolny et al. (1997)). This 
means that the probability of finding k eigenvalues X; 
in randomly shifted interval [X,X +L] of fixed 
length L is distributed according to the Poisson law 
Lke/k!. The gaps between eigenvalues have an 
exponential distribution, 


1 b E 
HUS N:Xa-Xelab)- f eds [5| 


as N — oc, and thus eigenvalues are likely to appear 
in clusters. This is in contrast to the general 
expectation that the energy level statistics of generic 
chaotic systems follow the distributions of random 
matrix ensembles; Poisson statistics are usually 
associated with quantized integrable systems. 
Although we are at present far from a proof of [5], 
the deviation from random matrix theory is well 
understood (see the section “Eigenvalue statistics 
and Selberg trace formula”). 

Highly excited quantum eigenstates wy;(j— oc) 
(cf. Figure 1) of chaotic systems are conjectured to 
behave locally like random wave solutions of [1], 


Figure 1 Image of the absolute-value-squared of an eigenfunc- 
tion wj(z) for a nonarithmetic surface of genus 2. The surface is 
obtained by identifying opposite sides of the fundamental region. 
Reproduced from Aurich and Steiner (1993) Statistical properties of 
highly excited quantum eigenstates of a strongly chaotic system. 
Physica D 64(1—3): 185-214, with permission from R Aurich. 


where boundary conditions are ignored. This 
hypothesis was put forward by Berry in 1977 and 
tested numerically, for example, in the case of 
certain arithmetic and nonarithmetic surfaces of 
constant negative curvature (Hejhal and Rackner 
1992, Aurich and Steiner 1993). One of the 
implications is that eigenstates should have uniform 
mass on the surface M, that is, for any bounded 
continuous function g: M — R 


f Ped f eda, joo l6 
M M 


where dA is the Riemannian area element on M. 
This phenomenon, referred to as quantum unique 
ergodicity (QUE), is expected to hold for general 
surfaces of negative curvature, according to a 
conjecture by Rudnick and Sarnak (1994). In the 
case of arithmetic hyperbolic surfaces, there has 
been substantial progress on this conjecture in the 
works of Lindenstrauss, Watson, and Luo-Sarnak 
(discussed later in this article; see also the review by 
Sarnak (2003)). For general manifolds with ergodic 
geodesic flow, the convergence in [6] is so far 
established only for subsequences of eigenfunctions 
of density 1 (Schnirelman-Zelditch-Colin de Verdiere 
theorem, see Quantum Ergodicity and Mixing of 
Eigenfunctions), and it cannot be ruled out that 
exceptional subsequences of eigenfunctions have 
singular limit, for example, localized on closed 
geodesics. Such “scarring” of eigenfunctions, at least 
in some weak form, has been suggested by numerical 
experiments in Euclidean domains, and the existence 
of singular quantum limits is a matter of controversy 
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in the current physics and mathematics literature. A 
first rigorous proof of the existence of scarred 
eigenstates has recently been established in the case 
of quantized toral automorphisms. Remarkably, 
these quantum cat maps may also exhibit QUE. A 
more detailed account of results for these maps is 
given in the section “Quantum eigenstates of cat 
maps”; see also Rudnick (2001) and De Bièvre (to 
appear). 

There have been a number of other fruitful 
interactions between quantum chaos and number 
theory, in particular the connections of spectral 
statistics of integrable quantum systems with the 
value distribution properties of quadratic forms, and 
analogies in the statistical behavior of energy levels 
of chaotic systems and the zeros of the Riemann zeta 
function. We refer the reader to Marklof (2006) and 
Berry and Keating (1999), respectively, for informa- 
tion on these topics. 


Hyperbolic Surfaces 


Let us begin with some basic notions of hyperbolic 
geometry. The hyperbolic plane H may be abstractly 
defined as the simply connected two-dimensional 
Riemannian manifold with Gaussian curvature —1. 
A convenient parametrization of H is provided by 
the complex upper-half plane, H= {x + iy:x € 


R,y > 0}, with Riemannian line and volume 
elements 
2 2 d 
gf uae] PER. [7] 
y y 


respectively. The group of orientation-preserving 
isometries of LH is given by fractional linear 
transformations 


az -- b 
945, "etd [8] 
( A € SL, R) 
é d 


where SL(2, R) is the group of 2 x 2 matrices with 
unit determinant. Since the matrices 1 and -1 
represent the same transformation, the group of 
orientation-preserving isometries can be identified 
with PSL(2,R):=SL(2,R)/{+1}. A finite-volume 
hyperbolic surface may now be represented as the 
quotient l' VH, where T C PSL(2, R) is a Fuchsian 
group of the first kind. An arithmetic hyperbolic 
surface (such as the modular surface) is obtained, if T 
has, loosely speaking, some representation in n x n 
matrices with integer coefficients, for some suitable n. 
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This is evident in the case of the modular surface, 
where the fundamental group is the modular group 


T = PSL(2,Z) 
"AC a) € PSL(2, R): a,b,c,d € Z /{+1) 
č ü 


A fundamental domain for the action of the 
modular group PSL(2, Z) on $ is the set 


F psLQ.Z) = EÑ: |z| > 1, 一 3 < Rez < 2 [9] 


(see Figure 2). The modular group is generated by 


the translation 
29 
( 0 1 ) :之 HZ 十 1 


and the inversion 


(1 Qe —1/z 


These generators identify sections of the boundary 
of Fpsti2.z). By gluing the fundamental domain 
along identified edges, we obtain a realization of the 
modular surface, a noncompact surface with one 
cusp at z— oo, and two conic singularities at z=i 
and z= 1/2 + iv3/2. 

An interesting example of a compact arithmetic 
surface is the “regular octagon,” a hyperbolic 
surface of genus 2. Its fundamental domain is 
shown in Figure 3 as a subset of the Poincaré disc 
D={zEC:|z|<1}, which yields an alternative 
parametrization of the hyperbolic plane H. In these 
coordinates, the Riemannian line and volume 
element read 


4(dx? + dy? 4dx d 
dit = AO dha in 
(1 — t = ys) (1—x* — y*) 
à Y 
一 1 0 1 x 


Figure 2 Fundamental domain of the modular group PSL(2, Z) 
in the complex upper-half plane. 


Figure 3 Fundamental domain of the regular octagon in the 
Poincaré disk. 


The group of orientation-preserving isometries is 
now represented by  PSU(1,1) — SU(1, 1)/(— 1], 
where 


sua.) -[(5 S iaseCat -lpP =1) [11] 


acting on D as above via fractional linear transfor- 
mations. The fundamental group of the regular 
octagon surface is the subgroup of all elements in 
PSU(1,1) with coefficients of the form 


a=k+IlV¥2, B=(m+nv2)V¥1+Vv2_— [12] 


where k,/,m,n € Z[i], that is, Gaussian integers of 
the form kı +iko,ki,k2 € Z. Note that not all 
choices of k,l,m,n € Zi] satisfy the condition 
la —|8]? —1. Since all elements 41 of T act 
fix-point free on H, the surface T\H is smooth 
without conic singularities. 

In the following, we will restrict our attention to a 


representative case, the modular surface with 
[—PSLI, 2). 


Eigenvalue Statistics and Selberg 
Trace Formula 


The statistical properties of the rescaled eigenvalues 
X; (cf. [4]) of the Laplacian can be characterized by 
their distribution in small intervals 


N(x, L) := #4 2% < X <x+ LE) [13] 


where x is uniformly distributed, say, in the 
interval [X, 2X], X large. Numerical experiments 
by Bogomolny, Georgeot, Giannoni, and Schmit, 
as well as Bolte, Steil, and Steiner (see references in 


Bogomolny (1997)) suggest that the X; are asymp- 
totically Poisson distributed: 


Conjecture 1 For any bounded function g : Z9 — C 
we have 


1 2X 
zi g(N 


as T — oc. 


(s, L))dx— Sg [14 
k=0 


One may also consider larger intervals, where 
L— o0 as X — oo. In this case, the assumption on 
the independence of the X; predicts a central-limit 
theorem. Weyl’s law [3] implies that the expectation 
value is asymptotically, for T — oc, 


x | - Mie Lids [15] 
X 


This asymptotics holds for any sequence of L 
bounded away from zero (e.g. L constant, or 
L — oo). 

Define the variance by 


2 p pe > 
EL) = (V(,L)-LYdx [16 


In view of the above conjecture, one expects 
X*(X,L)- L in the limit X—oo,L//X 0 (the 
variance exhibits a less universal behavior in the 
range L >> VX (the notation A < B means there is a 
constant c > 0 such that A < cB), cf. Sarnak (1995), 
and a central-limit theorem for the fluctuations 
around the mean: 


Conjecture 2 For any bounded function g: R — C 


we have 
Te 
Eu \/ 32 (x, L) 


= X Pr g(t) e-(/D* dt [17] 
as X, L — oo, L «& X. | 


The main tool in the attempts to prove the above 
conjectures has been the Selberg trace formula. It 
relates sums over eigenvalues of the Laplacians to 
sums over lengths of closed geodesics on the 
hyperbolic surface. The trace formula is in its 
simplest form in the case of compact hyperbolic 
surfaces; we have 


oo 


» b(p) = 


1-0 


= 


ee a b(p) tanh(zp)p dp 


lL g(nt,) 
* Yi. Um aa] 


4€ H, n= 
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where H, is the set of all primitive oriented closed 
geodesics y, and £, their lengths. The quantity p; is 
related to the eigenvalue A; by the equation A; — p? + 
1/4. The trace formula [18] holds for a large class of 
even test functions h. For example, it is sufficient to 
assume that hb is infinitely differentiable, and that the 
Fourier transform of 5, 


i) - y. | Moye dp (19 


hàs compact support. The trace formula for non- 
compact surfaces has additional terms from the 
parabolic elements in the corresponding group, and 
includes also sums over. the resonances of the 
continuous part of the spectrum. The noncompact 
modular surface behaves in many ways like a 
compact surface. In particular, Selberg showed that 
the number of eigenvalues embedded in the con- 
tinuous spectrum satisfies the same Weyl law as in 
the compact case (Sarnak 2003). 


Setting 
Area(.M) / ， 1 
4m (> +3) E 


where xjx,x.1j is the characteristic function of the 
interval [X, X + L], we may thus view (X,L) as 
the left-hand side of the trace formula. The above 
test function h is, however, not admissible, and 
requires appropriate smoothing. Luo and Sarnak (cf. 
Sarnak (2003)) developed an argument of this type 
to obtain a lower bound on the average number 
variance, 


b(p) = Xix.x+1] ( 


4 pL 

al X^(X,L')dLI' > te 3 [21] 
L Jo (log X) 

in the regime VX/logX « L «& VX, which is 
consistent with the Poisson conjecture X?(X, L) ~ L. 
Bogomolny, Levyraz, and Schmit suggested a remark- 
able limiting formula for the two-point correlation 
function for the modular surface (cf. Bogomolny 
et al. (1997) and Bogomolny (2006)), based on an 
analysis of the correlations between multiplicities of 
lengths of closed geodesics. A rigorous analysis of the 
fluctuations of multiplicities is given by Peter (cf. 
Bogomolny (2006)) Rudnick (2005) has recently 
established a smoothed version of Conjecture 2 in the 
regime 


VX VX 


p ^*^ ENSE 


—0 [22] 


where the characteristic function in [20] is replaced 
by a certain class of smooth test functions. 

All of the above approaches use the Selberg trace 
formula, exploiting the particular properties of the 
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distribution of lengths of closed geodesics in 
arithmetic hyperbolic surfaces. These will be dis- 
cussed in more detail in the next section, following 
the work of Bogomolny, Georgeot, Giannoni and 
Schmit, Bolte, and Luo and Sarnak (see Bogomolny 
et al. (1997) and Sarnak (1995) for references). 


Distribution of Lengths of Closed 
Geodesics 


The classical prime geodesic theorem asserts that the 
number N(/) of primitive closed geodesics of length 
less than £ is asymptotically 


ef 


N) ~ 了 [23] 


One of the significant geometrical characteristics of 
arithmetic hyperbolic surfaces is that the number of 
closed geodesics with the same length / grows 
exponentially with /. This phenomenon is most 
easily explained in the case of the modular surface, 
where the set of lengths / appearing in the lengths 
spectrum is characterized by the condition 


2 cosh(£/2) = |tr y| [24] 


where y runs over all elements in SL(2, Z) with 
Itr| >2. It is not hard to see that any integer n > 2 
appears in the set {|tr 4|: € SL(2, Z)), and hence 
the set of distinct lengths of closed geodesics is 


£ = (2 arcosh(n/2): n = 3,4,5,...} [25] 


Therefore, the number of distinct lengths less than £ 
is asymptotically (for large /) 


N'(£) = #(£ 0,4) ~ e? 26 


Equations [26] and [23] say that on average the 
number of geodesics with the same lengths is at least 
-e'/? [p 

The prime geodesic theorem [23] holds equally for 
all hyperbolic surfaces with finite area, while [26] is 
specific to the modular surface. For general arith- 
metic surfaces, we have the upper bound 


N'(£) < cef? [27] 


for some constant c > 0 that may depend on the 
surface. Although one expects N'(/) to be asympto- 
tic to (1/2)N(/) for generic surfaces (since most 
geodesics have a time-reversal partner which thus 
has the same length, and otherwise all lengths are 
distinct), there are examples of nonarithmetic Hecke 
triangles where numerical and heuristic arguments 
suggest N'(/) ~ c1e?* /£ for suitable constants cl > 0 
and 0<c)<1/2 (cf. Bogomolny (2006)). Hence 


exponential degeneracy in the length spectrum seems 
to occur in a weaker form also for nonarithmetic 
surfaces. 

A further useful property of the length spectrum 
of arithmetic surfaces is the bounded clustering 
property: there is a constant C (again surface 
dependent) such that 


#(£N [€,£+1])<C [28] 


for all @. This fact is evident in the case of the 
modular surface; the general case is proved by Luo 
and Sarnak (cf. Sarnak (1995)). 


Quantum Unique Ergodicity 


The unit tangent bundle of a hyperbolic surface IH 
describes the physical phase space on which the 
classical dynamics takes place. A convenient para- 
metrization of the unit tangent bundle is given by 
the quotient ['\PSL(2, R — this may be seen be means 
of the Iwasawa decomposition for an element 
g € PSL(2, R), 


1 g yl^? 0 
elo ash 本 yi 


. | cos 0/2 sin ed 29 
—sin0/2  cos0/2 


where x--iy € f) represents the position of the 
particle in T\H in half-plane coordinates, and 0 € 
[0, 27) the direction of its velocity. Multiplying the 
matrix [29] from the left by (28) and writing the 
result again in the Iwasawa form [29], one obtains 
the action 


az+b 


Ue, $)— Gent 


0 — 2arg(cz + 1) [30] 
which represents precisely the geometric action of 
isometries on the unit tangent bundle. 

The geodesic flow ®* on I'\PSL(2, R) is repre- 
sented by the right translation 


t/2 0 
o Ters Aal [31] 


The Haar measure p on PSL(2, R) is thus trivially 
invariant under the geodesic flow. It is well known 
that jz is not the only invariant measure, that is, 4 is 
not uniquely ergodic, and that there is in fact an 
abundance of invariant measures. The simplest 
examples are those with uniform mass on one, or a 
countable collection of, closed geodesics. 

To test the distribution of an eigenfunction 
p; in phase space, one associates with a function 


a € C™(T\PSL(2, R)) the quantum observable 
Op(a), a zeroth order pseudodifferential operator 
with principal symbol a. Using semiclassical tech- 
niques based on Friedrich's symmetrization, one 
can show that the matrix element 


vi(a) = (Op(a) yj, vj) [32] 


is asymptotic (as j— 00) to a positive functional 
that defines a probability measure on 
I\PSL(2, R). Therefore, if M is compact, any 
weak limit of v; represents a probability measure 
on T\PSL(2, R). Egorov's theorem (see Quantum 
Ergodicity and Mixing of Eigenfunctions) in turn 
implies that any such limit must be invariant 
under the geodesic flow, and the main challenge 
in proving QUE is to rule out all invariant 
measures apart from Haar. 


Conjecture 3 (Rudnick and Sarnak (1994); see 
Sarnak (1995, 2003)). For every compact hyperbolic 
surface I'M H, the sequence v; converges weakly to pu. 


Lindenstrauss has proved this conjecture for 
compact arithmetic hyperbolic surfaces of congru- 
ence type (such as the second example in the section 
*Hyperbolic surfaces") for special bases of eigen- 
functions, using ergodic-theoretic methods. These 
will be discussed in more detail in the next section. 
His results extend to the noncompact case, that is, to 
the modular surface where T — PSL(2, 7). Here he 
shows that any weak limit of subsequences of v; is 
of the form cu, where c is a constant with values in 
[0, 1]. One believes that c— 1, but with present 
techniques it cannot be ruled out that a proportion 
of the mass of the eigenfunction escapes into the 
noncompact cusp of the surface. For the modular 
surface, c — 1 can be proved under the assumption of 
the generalized Riemann hypothesis (see the section 
*Eigenfunctions and  L-functions" and  Sarnak 
(2003)). QUE also holds for the continuous part of 
the spectrum, which is furnished by the Eisenstein 
series E(z,s), where s=1/2+4ir is the spectral 
parameter. Note that the measures associated with 
the matrix elements 


v,(a) = (Op(a)E(., 1/2 + ir), E(-,1/2+ir)) [33] 


are not probability measures but only Radon 
measures, since E(z,s) is not square-integrable. Luo 
and Sarnak, and Jakobson have shown that 


. Va) pla) 
"ous (5) p) 94] 


for suitable test functions a,b € C*(T\PSL(2, R)) 
(cf. Sarnak (2003 )). 
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Hecke Operators, Entropy 
and Measure Rigidity 


For compact surfaces, the sequence of probability 
measures approaching the matrix elements v; is 
relatively compact. That is, every infinite sequence 
contains a convergent subsequence. Lindenstrauss' 
central idea in the proof of QUE is to exploit the 
presence of Hecke operators to understand the 
invariance properties of possible quantum limits. 
We will sketch his argument in the case of the 
modular surface (ignoring issues related to the non- 
compactness of the surface), where it is most 
transparent. 

For every positive integer n, the Hecke operator 
T, acting on continuous functions on T\H with 
DL —SL(2, Z) is defined by 


EE Xx. e fash 
T,f(z) = P» » d ) [35] 
ad=n 


The set M,, of matrices with integer coefficients and 
determinant n can be expressed as the disjoint union 


M, — U Url; 9 [36] 


and hence the sum in [35] can be viewed as a sum 
over the cosets in this decomposition. We note the 
product formula 


Tm T, = km a [3 7| 
d|gcd(m,n) 


The Hecke operators are normal, form a com- 
muting family, and in addition they commute with 
the Laplacian A. In the following, we consider an 
orthonormal basis of eigenfunctions y; of A that 
are simultaneously eigenfunctions of all Hecke 
operators. We will refer to such eigenfunctions as 
Hecke eigenfunctions. The above assumption is 
automatically satisfied, if the spectrum of A is 
simple (i.e., no eigenvalues coincide), a property 
conjectured by Cartier and supported by numerical 
computations. Lindenstrauss' work is based on the 
following two observations. Firstly, all quantum 
limits of Hecke eigenfunctions are geodesic-flow 
invariant measures of positive entropy, and sec- 
ondly, the only such measure of positive entropy 
that is recurrent under Hecke correspondences is 
the Lebesgue measure. 

The first property is proved by Bourgain and 
Lindenstrauss (2003) and refines arguments of 
Rudnick and Sarnak (1994) and Wolpert (2001) on 
the distribution of Hecke points (see Sarnak (2003) for 
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references to these papers). For a given point z € H 
the set of Hecke points is defined as 


T«(z) = Mz [38] 


For most primes, the set T(z) comprises (p + 1) 
p*-! distinct points on T\H. For each z, the Hecke 
operator T, may now be interpreted as the 
adjacency matrix for a finite graph embedded in 
['\H, whose vertices are the Hecke points T,,(z). 
Hecke eigenfunctions y; with 


Tw; = Na [39] 


give rise to eigenfunctions of the adjacency matrix. 
Exploiting this fact, Bourgain and Lindenstrauss 
show that for a large set of integers n 


lel « ^ alw [40] 


we T,,(z) 


that is, pointwise values of |;|^ cannot be substan- 
tially larger than its sum over Hecke points. This, 
and the observation that Hecke points for a large set 
of integers n are sufficiently uniformly distributed 
on T\H as »— oo, yields the estimate of positive 
entropy with a quantitative lower bound. 
Lindenstrauss’ proof of the second property, 
which shows that Lebesgue measure is the only 
quantum limit of Hecke eigenfunctions, is a result of 
a currently very active branch of ergodic theory: 
measure rigidity. Invariance under the geodesic flow 
alone is not sufficient to rule out other possible limit 
measures. In fact, there are uncountably many 
measures with this property. As limits of Hecke 
eigenfunctions, all quantum limits possess an addi- 
tional property, namely recurrence under Hecke 
correspondences. Since the explanation of these is 
rather involved, let us recall an analogous result in a 
simpler setup. The map x2:x — 2x mod 1 defines a 
hyperbolic dynamical system on the unit circle with 
a wealth of invariant measures, similar to the case of 
the geodesic flow on a surface of negative curvature. 
Furstenberg conjectured that, up to trivial invariant 
measures that are localized on finitely many rational 
points, Lebesgue measure is the only x2-invariant 
measure that is also invariant under action of 
x3:x++3xmod1. This fundamental problem is 
still unsolved and one of the central conjectures in 
measure rigidity. Rudolph, however, showed that 
Furstenberg’s conjecture is true if one restricts the 
statement to x2-invariant measures of positive 
entropy (cf. Lindenstrauss (to appear)). In Linden- 
strauss’ work, x2 plays the role of the geodesic 
flow, and x3 the role of the Hecke correspondences. 
Although here it might also be interesting to ask 
whether an analog of Furstenberg's conjecture 


holds, it is inessential for the proof of QUE due to 
the positive entropy of quantum limits discussed in 
the previous paragraph. 


Eigenfunctions and L-Functions 


An even eigenfunction g;(z) for T —SL(2, Z) has the 
Fourier expansion 


pi(z) = S| aj(n)y"? Ki, (2xny)cos(2xnx) [41] 
n=1 
We associate with y;(z) the Dirichlet series 


L(s, y) = >》 aj(n)n^? [42] 


which converges for Re s large enough. These series 
have an analytic continuation to the entire complex 
plane C and satisfy a functional equation, 


A(s, pj) = A(1— s, vj) [43] 


where 
_en fA + 19; S — 1p; 
A(s,q;) — T r( 5 aui 5 am yj) [44] 


If wi(z) is in addition an eigenfunction of all Hecke 
operators, then the Fourier coefficients in fact 
coincide (up to a normalization constant) with the 
eigenvalues of the Hecke operators 


aj(m) = X(m)a;(1) 45 


If we normalize a;(1) — 1, the Hecke relations [37] 
result in an Euler product formula for the 
L-function, 


L(se)- [[(1-«()^-»-)' [46 


p prime 


These L-functions behave in many other ways like 
the Riemann zeta or classical Dirichlet L-functions. 
In particular, they are expected to satisfy a Riemann 
hypothesis, that is, all nontrivial zeros are con- 
strained to the critical line Ims — 1/2. 

Questions on the distribution of Hecke eigenfunc- 
tions, such as QUE or value distribution properties, 
can now be translated to analytic properties of 
L-functions. We will discuss two examples. 

The asymptotics in [6] can be established 
by proving [6] for the choices g=y,,k=1,2,..., 
that is, 


f laada w7 
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Watson discovered the remarkable relation (Sarnak 
2003) 


| f Qj Pipi dA 
M 


= TAS, pi X Dj, X Pija) 
A(1, sym?oj )A(1, sym? yj )A(1, sym?oj,) 


3 


= 


[48] 


The L-functions A(s,g) in Watson's formula are 
more advanced cousins of those introduced earlier 
(see Sarnak (2003) for details). The Riemann 
hypothesis for such L-functions then implies, via 
[48], a precise rate of convergence to QUE for the 
modular surface, 


] ef eda - /. gda+OUQTUV4 (49) 


for any e > 0, where the implied constant depends 
on e and g. 

A second example on the connection between 
statistical properties of the matrix elements 
vi(a) = (Op(a)y;, pi) (for fixed a and random j) and 
values L-functions has appeared in the work of Luo 
and Sarnak (cf. Sarnak (2003)). Define the variance 


Noy lve Vj - u(a)| [50] 


) <A 


with N(A) = #{j: Aj € A}; cf. [3]. Following a conjec- 
ture by Feingold-Peres and Eckhardt et al. (see Sarnak 
(2003) for references) for *generic" quantum chaotic 
systems, one expects a central-limit theorem for the 
statistical fluctuations of the v;(a), where the normal- 
ized variance N(A)?V,(a) is asymptotic to the 
classical autocorrelation function C(a), see eqn [54]. 


Conjecture 4 For any bounded function g: R — C 


we have 
(a) 
win ome) 
- = J g(t)e- 0/2* dt [51] 
T J-—oo 
as 入 一 oo. 


Luo and Sarnak prove that in the case of the 
modular surface the variance has the asymptotics 


lim N(A)!'^ V, (a) = 


A oe 


(Ba, a) [52] 


where B is a non-negative self-adjoint operator 
which commutes with the Laplacian A and all 
Hecke operators T,. In particular, we have 


By; = 5 L (5, vj) C(vj)vj [53] 


where 


cay [ f a(' 
R JT\PSL(2.R) 


is the classical autocorrelation function for the 
geodesic flow with respect to the observable a 
(Sarnak 2003). Up to the arithmetic factor 
(1/2)L(1/2,y;), eqn [53] is consistent with the 
Feingold—Peres prediction for the variance of generic 
chaotic systems. Furthermore, recent estimates of 
moments by Rudnick and Soundararajan (2005) 
indicate that Conjecture 4 is not valid in the case of 
the modular surface. 


(g))a(g)du(g)de [54] 


Quantum Eigenstates of Cat Maps 


Cat maps are probably the simplest area-preserving 
maps on a compact surface that are highly chaotic. 
They are defined as linear automorphisms on the 
torus T^ - R?/Z?, 


$,:T^— T? [55] 


where a point £c R'(modZ?) is mapped to 
A£(mod Z^); A is a fixed matrix in GL(2, Z) with 
eigenvalues off the unit circle (this guarantees 
hyperbolicity). We view the torus T* as a symplectic 
manifold, the phase space of the dynamical system. 
Since T? is compact, the Hilbert space of quantum 
states is an N-dimensional vector space Hyn, N 
integer. The semiclassical limit, or limit of small 
wavelengths, corresponds here to N — oc. 

It is convenient to identify Hy with L*(Z/ NZ), 
with the inner product 


(4, V2) 一 = > 


Q mod N 


vi(O)v5(Q) [56] 


For any smooth function f € C*(T?), define a 
quantum observable 


Opy(f) = V f(n)TN(n) 
neZ? 


where f(n) are the Fourier coefficients of f, and 
TN(n) are translation operators 


Ty(n) = eA gag [57] 


(tw)(Q) = v(Q + 1) 
[t24] ( - = ONH) 


The operators Op,(a) are the analogs of the 
pseudodifferential operators discussed in the section 
*Quantum unique ergodicity." 


[58] 
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A quantization of ®, is a unitary operator UN(A) 
on L^(Z/NZ) satisfying the equation 
UN(A) 'Opy(f)Un(A) = Opn(fo Ba) [59] 


for all f € C*(T?). There are explicit formulas for 
UN(A) when A is in the group 


r={(¢ 4 ) €SL(2,Z):ab = ed = Omod2} [60] 


These may be viewed as analogs of the Shale-Weil 
or metaplectic representation for SL(2). for example, 
the quantization of 


A- (3 i) i61] 


yields 
WO -NI Y^ exp | (Q^ 
Q' mod N 
- QQ'4. Q^ ] WQ’) 62] 


In analogy with [1], we are interested in the 
statistical features of the eigenvalues and eigenfunc- 
tions of UN(A), that is, the solutions to 


Un(A)y = Ay, >(Z,/NZ) = 1 [63] 


Unlike typical quantum-chaotic maps, the statistics 
of the N eigenvalues 


Jaga; Ana, a ANN ES [64] 


do not follow the distributions of unitary random 
matrices in the limit N — oo, but are rather singular 
(Keating 1991). In analogy with the Selberg trace 
formula for hyperbolic surfaces [18], there is an 
exact trace formula relating sums over eigenvalues 
of UN(A) with sums over fixed points of the classical 
map (Keating 1991). 

As in the case of arithmetic surfaces, the eigenfunc- 
tions of cat maps appear to behave more generically. 
The analog of the Schnirelman-Zelditch-Colin de 
Verdiere theorem states that, for any orthonormal 


basis of eigenfunctions (ony we have, for all 
f € C"(T^, 


y 


(Op(f)ew.vw)- |.f(Odt — — (65 


as N — oc, for all j in an index set Jy of full density, 
that is, #Jn ~ N. Kurlberg and Rudnick (see 
Rudnick (2001)) have characterized special bases of 
eigenfunctions leui i (termed Hecke eigenbases, 
in analogy with arithmetic surfaces) for which QUE 
holds, generalizing earlier work of Degli Esposti, 


Graffi, and Isola (1995). That is, [65] holds for all 
j—1,...,N. Rudnick and Kurlberg, and more 
recently Gurevich and Hadani, have established 
results on the rate of convergence analogous to 
[49]. These results are unconditional. Gurevich and 
Hadani use methods from algebraic geometry based 
on those developed by Deligne in his proof of the 
Weil conjectures (an analog of the Riemann hypoth- 
esis for finite fields). 

In the case of quantum-cat maps, there are values 
of N for which the number of coinciding eigenvalues 
can be large, a major difference to what is expected 
for the modular surface. Linear combinations of 
eigenstates with the same eigenvalue are as well 
eigenstates, and may lead to different quantum 
limits. Indeed, Faure, Nonnenmacher, and De Biévre 
(see De Biévre (to appear)) have shown that there 
are subsequences of values of N, so that, for all 
f € C*(T^), 

Op(f)e ew) - 5 | FEOEO — 68 
that is, half of the mass of the quantum limit 
localizes on the hyperbolic fixed point of the map. 
This is the first, and to date the only, rigorous result 
concerning the existence of scarred eigenfunctions in 
systems with chaotic classical limit. 
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Introduction 


A major motivation for studying the asymptotic 
structure of spacetimes has been the need for a 
rigorous description of what should be understood by 
an “isolated system” in Einstein’s theory of gravity. 
As an example, consider a gravitating system some- 
where in our universe (e.g., a galaxy, a cluster of 
galaxies, a binary system, or a star) evolving accord- 
ing to its own gravitational interaction, and possibly 
reacting to gravitational radiation impinging on it 
from the outside. Thereby it will emit gravitational 
radiation. We are interested in describing these waves 
because they provide us with important information 
about the physics governing the system. 

To adequately describe this situation, we need to 
idealize the real situation in an appropriate way, since 
it is hopeless to try to analyze the behavior of the 
system in its interaction with the rest of the universe. 
We are mainly interested in the behavior of the 
system, and not so much in other processes taking 
place at large distances from the system. Since we 
would like to ignore those regions, we need a way to 
isolate the system from their influence. 

The notion of an isolated system allows us to 
select individual subsystems of the universe and 
describe their properties regardless of the rest of the 
universe so that we can assign to each subsystem 
such physical attributes as its energy-momentum, 
angular momentum, or its emitted radiation field. 
~ Without this notion, we would always have to take 
into account the interaction of the system with its 
environment in full detail. 

In general relativity (GR) it turns out to be a rather 
difficult task to describe an isolated system and the 
reason is — as always in Einstein’s theory — the fact 
that the metric acts both as the physical field and as 


the background. In other theories, like electrody- 
namics, the physical field, such as the Maxwell field, 
is very different from the background field, the flat 
metric of Minkowski space. The fact that the metric 
in GR plays a dual role makes it difficult to extract 
physical meaning from the metric because there is no 
nondynamical reference point. 

Imagine a system alone in the universe. As we 
recede from the system we would expect its influence 
to decrease. So we expect that the spacetime which 
models this situation mathematically will resemble 
the flat Minkowski spacetime and it will approximate 
it even better the farther away we go. This implies 
that one needs to impose fall-off conditions for the 
curvature and that the manifold will be asymptoti- 
cally flat in an appropriate sense. However, there is 
the problem that fall-off conditions necessarily imply 
the use of coordinates and it is awkward to decide 
which coordinates should be *good ones." Thus, it is 
not clear whether the notion of an asymptotically flat 
spacetime is an invariant concept. 

What is needed, therefore, is an invariant defini- 
tion of asymptotically flat spacetimes. The key 
observation in this context is that "infinity" is far 
away with respect to the spacetime metric. This 
means that geodesics heading away from the system 
should be able to *run forever," that is, be defined 
for arbitrary values of their affine parameter s. 
"Infinity" will be reached for s— oc. However, 
suppose we do not use the spacetime metric g but a 
metric g which is scaled down with respect to g, that 
is, in such a way that  — (7g for some function Q. 
Then it might be possible to arrange €) in such a way 
that geodesics for the metric g cover the same events 
(strictly speaking, this holds only for null geodesics, 
but this is irrelevant for the present plausibility 
argument) as those for the metric g yet that their 
affine parameter $ (which is also scaled down with 
respect to s) approaches a finite value $0 for s — oc. 
Then we could attach a boundary to the spacetime 
manifold consisting of all the limit points corre- 
sponding to the events with $ = $9 on the g-geodesics. 
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This boundary would have to be interpreted. as 
"infinity" for the spacetime because it takes infinitely 
long for the g-geodesics to get there. 

We arrived at this idea of attaching a boundary by 
considering the metric structure only “up to arbi- 
trary scaling," that is, by looking at metrics which 
differ only by a factor. This is the conformal 
structure of the spacetime manifold in question. By 
considering the spacetime only from the point of 
view of its conformal structure we obtain a picture 
of the spacetime which is essentially finite but which 
leaves its causal properties unchanged, and hence in 
particular the properties of wave propagation. This 
is exactly what is needed for a rigorous treatment of 
radiation emitted by the system. 


Infinity for Minkowski Spacetime 


The above discussion suggests that we should consider 
the spacetime metric only up to scale, that is, 
to focus on the conformal structure of the spacetime 
in question. Since we are interested in systems which 
approach Minkowski spacetime at large distances 
from the source, it is illuminating to study Minkowski 
spacetime as a preliminary example. So consider the 
manifold M = R* equipped with the flat metric 


g = d? — d? — rdo? it 


where r is the standard radial coordinate defined by 
r^ — x? +y + 2? and 


do? = d6* + sin? 9d? 


is the standard metric on the unit sphere $?. We now 
introduce retarded and advanced time coordinates, 
which are adapted to the null cone and hence to the 
conformal structure of g by the definition 


u-—t-—r, v=t+r 


and obtain the metric in the form 
g = dudv—1(v—u)*do? 


The coordinates u and v both take arbitrary real values 
but they are restricted by the relation v — 4 — 2r > 0. 
In order to see what happens “at infinity," we introduce 
the coordinates U and V by the relations 


u — tan U, pc Y 


Then U and V both take values in the open interval 
(—7/2,7/2) with V > U and the metric is trans- 
formed to 


] 


— — 1 2 — 
8—4 poa Adu AV -sin'(V - U)de^] [2] 


Clearly, the metric is undefined at events with 
cos U — 0 or cos V —0. These would correspond to 
events with u= 土 co or v= +00 which do not lie in 
M. However, by defining the function 


0} = 2Zcos U cos V 


we find that the metric g=g with 
£ = 4dU dV — sin? (V — U) do? [3] 


is conformally equivalent to g and is regular for all 
values of U and V (keeping V > U). In fact, by 
defining the coordinates 


this metric takes the form 
@ = dT? — dR? — sin? R do? [4] 


the metric of the static Einstein universe E. Thus, we 
may regard the Minkowski spacetime as the part of 
the Einstein cylinder defined by restricting the 
coordinates T and R to the region |T|+R <7 as 
illustrated in Figure 1. Although M can be considered 
as being diffeomorphic to the shaded part in Figure 1, 
these two manifolds are not isometric. This is obvious 
from considering the properties of the events lying on 


Figure 1 The embedding of Minkowski spacetime into the 
Einstein cylinder. 


the boundary OM of M in E. Fix a point P inside M 
and follow a null geodesic with respect to the metric g 
from P toward the future. It will intersect OM after a 
finite amount of its affine parameter has elapsed. 
When we follow a null geodesic with respect to g 
from P in the same direction, we find that it does not 
reach OM for any value of its affine parameter. Thus, 
the boundary is at infinity for the metric g but at a 
finite location with respect to the metric g. When we 
consider all possible kinds of geodesics for the metric g 
we find that OM consists of five qualitatively different 
pieces. The future pointing timelike geodesics all 
approach the point it given by (T, R) — (7, 0), while 
the past-pointing geodesics approach ¿+ with coordi- 
nates (—7, 0). All spacelike geodesics come arbitrarily 
close to a point i? with coordinates (0,7) (located on 
the front of the cylinder in Figure 1). Null geodesics, 
however, are different. For any point (T. m — |T|) with 
T #0, +r on OM there are g-null-geodesics which 
come arbitrarily close. 

In this sense, we may regard OM as consisting of 
limit points obtained by tracing-geodesics for infi- 
nite values of their affine parameters. According to 
the causal character of the geodesics the set of their 
respective limit points is called future/past timelike 
infinity i+, spacelike infinity i? or future/past null- 
infinity, denoted by .;^. These two parts of null- 
infinity are three-dimensional regular submanifolds 
of the embedding manifold E, while the points i*, ;? 
are regular points in E in the sense that the metric g 
is regular there. This is not automatic, considering 
the fact that infinitely many geodesics converge to a 
single point. However, the flatness of Minkowski 
spacetime guarantees that the geodesics approach at 
just the appropriate rate for the limit points to be 
regular. 

This example shows that the structure of the 
boundary is determined entirely by the metric g of 
Minkowski spacetime. If we had chosen a different 
function 2/=wQ with w> 0 then we would not 
have obtained the Einstein 一 cylinder but some 
different Lorentzian manifold (M’,g’). Yet, the 
boundary of M in M’ would have had the same 
properties. 


Asymptotically Flat Spacetimes 


The physical idea of an isolated system is captured 
mathematically by an asymptotically flat space- 
time. Since such a spacetime M is expected to 
approach Minkowski spacetime asymptotically, 
the asymptotic structure of M is also expected to 
be similar to that of M. This expectation is 
expressed in 


Asymptotic Structure and Conformal Infinity 223 


Definition 1 A spacetime (M, gap) is called *asymp- 
totically simple” if there exists a manifold-with- 
boundary M with metric g,, and scalar field €) on 
M and boundary Z — OM such that the following 
conditions hold: 


1. M is the interior of M: M = int M; 

2. ab = Q^ gab on M; 

3. Q and £,; are smooth on all of M; 

4. 0» 0 on M5;Q—0, V,O 4 0 on ./; and 

5. each null geodesic acquires both future and past 
endpoints on .;. 


This definition formalizes the construction which 
was explicitly performed above, by which one 
attaches a regular (nonempty) boundary to a space- 
time after suitably rescaling its metric. Asymptoti- 
cally simple spacetimes are exactly those for which 
this process of conformal compactification is possi- 
ble. The purpose of condition 5 is to exclude 
pathological cases. There are spacetimes which do 
not satisfy this condition (e.g., the Schwarzschild 
spacetime, where some of the null geodesics enter 
the event horizon and cannot escape to infinity). 
Yet, one would like to include them as being 
asymptotically simple in a sense, because they 
clearly describe isolated systems. For these cases, 
there exists the notion of weakly asymptotically 
simple spacetimes. 

In order to arrive at asymptotically flat space- 
times, one needs to make certain assumptions about 
the behavior of the curvature near the boundary, 
thus: 


Definition 2 An asymptotically simple spacetime is 
called “asymptotically flat" if its Ricci tensor Ric[g] 
vanishes in a neighborhood of 7. 


Note that this definition imposes a rather strong 
restriction on the Ricci curvature; less restrictive 
assumptions are possible. This condition applies 
only near .%. Thus, it is possible to consider 
spacetimes which contain matter fields as long as 
these fields do not extend to infinity. 

Other asymptotically simple spacetimes which are 
not asymptotically flat are the de Sitter and anti-de 
Sitter spacetimes which are solutions of the Einstein 
equations with nonvanishing cosmological constant A. 
It is a simple consequence of the definition that 
the boundary ./ is a regular three-dimensional 
hypersurface of the embedding spacetime M which 
is timelike, spacelike, or null depending on the sign 
of A. In particular, for the Minkowski spacetime 
(\=0) the boundary is necessarily a null hypersur- 
face, as noted above. 

The requirement that 
equations hold near / 


the vacuum Einstein 
has several important 
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consequences. First, 7 is a null hypersurface with 
the special property of being shear-free. This means 
that any cross section of a bundle of its null 
generators does not suffer any distortions when 
moved along the generators. Only expansion or 
contraction can occur. The global structure of .; 
is the same as the one from the example above. 
Null infinity consists of two connected components, 
/*, each of which is diffeomorphic to $? x R. Thus, 
topologically, .#7™ are cylinders. The cone-like 
appearance as seen in Figure 1 is artificial. It 
depends on the particular conformal factor 2 chosen 
for the conformal compactification. Furthermore, it 
is only in very exceptional cases that the metric & is 
regular at 7° or ;*. 

The most important consequence, however, con- 
cerns the conformal Weyl tensor Cpa. This is the 
part of the full Riemann curvature tensor R^; which 
is trace-free. It is invariant under conformal rescal- 
ings of the metric. Thus, on M, C*,,4; = C’ bea. When 
the vanishing of the Ricci tensor near 7 is assumed 
then it turns out that the Weyl tensor necessarily 
vanishes on *. This is the ultimate justification for 
calling such manifolds asymptotically flat because the 
entire curvature vanishes on .7. 


Some Consequences 


There are several consequences of the existence of 
the conformal boundary 7. They all can be traced 
back to the fact that this boundary can be used to 
separate the geometric fields into a universal back- 
ground field and dynamical fields which propagate 
on it. The background is given by the boundary 
points attached to an asymptotically flat spacetime 
which always form a three-dimensional null hyper- 
surface .7 with two connected components (in the 
sequel, we restrict our attention to .^* only; »' is 
treated similarly), each with the topology of a 
cylinder. And in each case, .7 is shear-free. 


The BMS Group 


Since the structure of null-infinity is universal over 
all asymptotically flat spacetimes, it is obvious that 
its symmetry group should also possess a universal 
meaning. This group, the so-called Bondi-Metzner- 
Sachs (BMS) group is in many respects similar to the 
Poincaré group, the symmetry group of M. It is the 
semidirect product of the Lorentz group with an 
abelian group which, however, is not the four- 
dimensional translation group but an infinite-dimen- 
sional group of supertranslations. This group is a 
normal subgroup, so the factor group is isomorphic 
to the Lorentz group. 


In physical terms, the supertranslations arise 
because there are infinitely many directions from 
which observers at infinity (whose world lines coincide 
with the null generators of . in a certain limit) can 
observe the system and because each observer is free to 
choose its own origin of proper time z. The observers 
surrounding the system are not synchronized, because 
under the assumptions made there is no natural way to 
fix a unique common origin. Hence, a supertranslation 
is a shift of the parameter along each null generator of 
¥* corresponding to a change of origin for each 
individual observer. It can be given as a map S* 一 R. 
A choice of origin on each null generator of #7 is 
referred to as a “cut” of »*. It is a two-dimensional 
surface of spherical topology which intersects each null 
generator exactly once. It is an open question whether 
one can always synchronize the observers by imposing 
canonical conditions at i? or i*, thereby reducing the 
BMS group to the smaller Poincaré group. 

The supertranslations contain a unique four- 
dimensional normal subgroup. In M these special 
supertranslations are the ones which are induced by 
the translations of Minkowski spacetime in the 
following way. Take the future light cone of some 
event P and follow it out to » *, where its intersection 
defines an origin for each observer located there. 
Now consider the light cone of another event O 
obtained from P by a translation in a spatial 
direction. Then the light emitted from O will arrive 
at ^* earlier than that from P for observers in the 
direction of the translation, while it will be delayed 
for observers in the opposite direction. This change 
in arrival time defines a specific supertranslation. 
Similarly, for a translation in a temporal direction, 
the light from © will arrive later than that from P 
for all observers. Thus, every translation in M 
defines a particular supertranslation on .;*. These 
can be characterized in a different way, which is 
intrinsic to .^* and which can be used in the general 


case even though there will be no Killing vectors 


present in a general asymptotically flat spacetime. In 
an appropriate coordinate system, the asymptotic 
translations are given as linear combinations of the 
first four spherical harmonics Yoo, Yio, Yi+1. The 
space of asymptotic translations T is in a natural 
way isometric to M. 


The Peeling Property 


Now consider the Weyl tensor C^,,4 on M. Since it 
vanishes on 7 where Q=0 we may form the 
quotient 


4 T —] (4 
KR m O0 Ca 


which can be shown to be smooth on ;*. The 
physical interpretation of this tensor field is based 
on the following properties. In source-free regions 
the field satisfies the spin-2 zero-rest-mass equation 


VK" bed =0 


which is very similar to the Maxwell equations for 
the electromagnetic (spin-1) Faraday tensor. Thus, 
K^,,4 is interpreted as the gravitational field, which 
describes the gravitational waves contained inside 
the system. The zero-rest-mass equation for K^?;,; 
and the fact that the field is smooth on 7 implies that 
the Weyl tensor satisfies the *peeling" property. This 
is a characteristic conspiracy between the fall-off 
behavior of certain components of the Weyl tensor 
along outgoing g-null-geodesics approaching ^ in 
M with respect to an affine parameter s for s — oo 
and their algebraic type. Symbolically, the Weyl 
tensor has the following behavior as s — oc along 
the null geodesic: 

4| pu, 211] 


-5 
5 s? s? T s^ HEEE D 19d 


where the numerator of each component indicates 
its Petrov type. The repeated principal null direction 
(PND) in the first three components and one of the 
PNDs in the fourth component are aligned with the 
tangent vector of the geodesic. This implies that 
the farthest reaching component of the Weyl tensor, 
which is O(1/s), has the Petrov type of a radiation 
field. It is customary to combine the components 
which are O(1/s') into one complex function and 
denote it by vs ;. When expressed in terms of the 
field K^,,; on M, this fall-off behavior implies that 
of all components of K®,.y only %4 does not 
necessarily vanish on .»*. 

In special cases like the Minkowski, Schwarzs- 
child, Kerr, and more generally in all asymptotically 
flat stationary spacetimes, even w4 vanishes on /*. 
For these reasons, wa is called the radiation field of 
the system, that is, that part of the gravitational field 
which can be registered by the observers at infinity. 
It describes the outgoing radiation which is being 
emitted by the system during its evolution. 


The Bondi-Sachs Mass-Loss Formula 


Gravitational waves carry away energy from the 
system. This is a consequence of the Bondi-Sachs 
mass-loss formula. The  Bondi-Sachs  energy- 
momentum is related to a weighted integral over a 
cut C, 


1 TEC 
Pew] - -— | Whn+oalds 16) 
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The quantity in brackets, the mass aspect, is a 
combination of the scalar v; which in a sense 
measures the strength of the Coulomb-like part of 
the gravitational field on .** and the complex 
quantity c. In a so-called Bondi coordinate system, 
this quantity is related to the radiation field %4 by 
the relation 


W4 = —g 


the dot indicating differentiation with respect to the 
affine parameter along the null generators. Thus, c 
is essentially the second time integral of the 
radiation field. The mass aspect is integrated against 
a function W which is an asymptotic translation, 
that is, a linear combination of the first four 
spherical harmonics. Thus, one can view the 
expression [6] as defining a linear map T 一 R. 
Since T and M are isometric this defines a covector 
P, on M, which can always be shown to be timelike, 
P,P^ > 0. This positivity property together with the 
fact that in the special cases of Schwarzschild and 
Kerr spacetimes the integral yields the mass para- 
meters when evaluated for a time translation 
(W —1) motivates the interpretation of Pe as the 
energy-momentum 4-vector of the spacetime at the 
instant defined by the cut C. In particular, for W = 1 
the integral gives the time component of Pe, the 
Bondi-Sachs energy E. 

The interpretation of [6] as energy-momentum is 
strengthened by the fact that Pe arises as dual to the 
translations which is familiar from Lagrangian field 
theories where energy and momentum appear as 
generators for time and space translations. In fact, 
one can set up a Hamiltonian framework where the 
role of the Bondi-Sachs energy-momentum as 
generator of asymptotic translations is made 
explicit. 

This point of view suggests that one should also 
be able to define a notion of angular momentum for 
asymptotically flat spacetimes because angular 
momentum arises as the generator of rotations, 
which can also be defined asymptotically. However, 
while there is a unique notion of translation on » *, 
this is not the case for rotations (and boosts). The 
reason is hidden in the structure of the BMS group 
where the Lorentz group appears naturally as a 
factor group but not as a unique subgroup. In 
physical terms, the angular momentum depends on 
an origin but there is no natural way to choose an 
origin on 7°°. This ambiguity in the choice of origin 
leads to several nonequivalent expressions for 
angular momentum in the literature. 

Consider now two cuts C and C', with C' later than 
C. Then we may compute the difference AE — E — E' 
of the Bondi-Sachs energies with respect to the two 
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cuts. It turns out that this difference can be 
expressed as an integral over the (three-dimensional) 
piece X of .#7 which is bounded by the two cuts 
(i.e., 0X: 2 C' — C): 


/ -—" ] os 9% 
E-E- gc ev [7] 


This result means that the Bondi-Sachs energy of the 
system decreases, since E' < and the rate of 
decrease is given by the (positive-definite) amount 
of gravitational radiation which leaves the system 
during the period defined by the two cuts. 

It is necessary to point out that in this article the 
structure of null infinity has been postulated based 
on physical reasonings. The Einstein equations have 
been used only in a very weak sense, namely only in 
a neighborhood of .#. It is an entirely different 
question whether the field equations are compatible 
with this postulated structure. To answer it, one 
needs to show that there are global solutions of the 
Einstein equations which exhibit the postulated 
behavior in the asymptotic region. This question 
has been settled recently in the affirmative: there are 
many global spacetimes which are asymptotically 
flat in the sense described here. 

This article discussed has the notion of null 
infinity, that is, of spacetimes which are asymptoti- 
cally flat in lightlike directions. Spacetimes which 
are asymptotically flat in spacelike directions have 
not been covered. The latter is a notion which has 
been developed largely independently of null infinity 
since it is essentially a property of an initial data set 
and not of the entire four-dimensional spacetime. 
Ultimately, these two notions should coincide, in the 
sense that if one has an initial data set which is 
asymptotically flat in spatial directions in an appro- 
priate sense then its Cauchy development will be an 
asymptotically flat spacetime. However, as of yet, it 
is not clear what the appropriate conditions should 
be because the structure of the gravitational field in 
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Introduction 


Averaging methods are the methods of perturbation 
theory that are based on the averaging principle and 
the idea of dividing the dynamics into slow drift and 


the neighborhood of spacelike infinity i” is not 
sufficiently well understood so far. 
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fast oscillations. The most common field of applica- 
tions of averaging methods is the analysis of the 
behavior of dynamical systems that differ from 
integrable systems by small perturbations. 


Averaging Principle 


Equations of motion of a system that differ from an 
integrable system by small perturbations often can 
be written in the form 


I = eg(l, c, &), p = w(I) + ef (I, v, €) 
I— (h,...,I,) € R” 1] 
Q = (Y1,---, Ym) € T" modd27,0<e< 1 


The small parameter & characterizes the amplitude 
of the perturbation. For ¢«=0 one gets the 
unperturbed system. The equation /[- const. sin- 
gles out an invariant m-dimensional torus of the 
unperturbed system. The motion on this torus is 
quasiperiodic with frequency vector w(I); compo- 
nents of vector I are called “slow variables" 
whereas components of vector y are called “fast 
variables” or “phases.” The right-hand sides of 
system [1] are 27-periodic with respect to all yj. It 
is assumed that they are smooth enough functions 
of all arguments. It is also assumed that compo- 
nents of the frequency vector are not linearly 
dependent over the ring of integer numbers 
identically with respect to I. System [1] is called 
a "system with rotating phases." 

In applications, one is often interested mainly in 
the behavior of slow variables. The “averaging 
principle" (or method) consists in replacing the 
system of perturbed equations [1] by the “averaged 
system" 


j-«GQ). GU) =r" $ gedo D 


for the purpose of providing an approximate 
description of the evolution of the slow variables 
over time intervals of order 1/e or longer. Here, 
de-—dgoi---do,,. System [2] contains only slow 
variables and, therefore, is much simpler for 
investigation than system [1]. When passing from 
system [1|] to system [2], one ignores the terms 
g(I,~,0) — G(I) on the right-hand side of [1]. The 
averaging principle is based on the idea that these 
terms oscillate and lead only to small oscillations 
which are superimposed on the drift described by 
the averaged system. To justify the averaging 
principle, one should establish a relation between 
the behavior of the solutions of systems [1] and [2]. 
This problem is still far from being completely 
solved. 

Another version of the averaging principles is 
used in the case when frequencies are approxi- 
mately in resonance. This means that one or 
several relations of the form (k,w)=0 approxi- 
mately are valid with irreducible integer coefficient 
vectors k Æ 0; here, (k,w) is the standard scalar 
product in R”. Let 了 be a sublattice of the integer 
lattice Z" generated by these vectors. Let 
r=rankT and kU,?,..., b" be a basis in Z”, 
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the first r vectors of which belong to T. Instead of 
p, one can introduce new variables: 


0 = (04,...,0,) € T" modd 27 
X = (X1,---, Xm) € T™ “ modd 27s 
0; — (kP p), — x; = (RP, gj) 


Let R be an r x m matrix whose rows are vectors 
k?. 1 « i € r. For an approximate description of the 
behavior of variables 1,9, the averaging principle 
prescribes replacing system [1] by the system 


J=eGr(J,7), y= Ro(J) + eRFr(J,7) 
Gi.) - Qn) "^ d. aUe dx — gg 
Fr(J,0) = Qr) "7 9. f.e.0)dx 


(one should express g,f through ,x and then 
integrate over y,dy=dy1---dxm_,). System [3] is 
called *partially averaged system" for resonances in 
LT. Functions Gp, Fr can be obtained from Fourier 
series expansions of functions g, f for &—0 
by throwing away harmonics exp(i(k,y)), kér 
(nonresonant harmonics). Passing from system [1] 
to system [3] is based on the idea that the ignored 
nonresonant harmonics oscillate fast and do not 
affect essentially the evolution of the slow variables. 

Now let system [1] be a Hamiltonian system close 
to an integrable one. The Hamiltonian function has 
the form 


H = Ho(p) + eHi(p, p, y,x,&) 


where ,x are coordinates and p,y are conjugated 
to them. The equations of motion have the same 
form as [1], with I= (p, y, x): 


_ 0Hi OH, 

Poy Pee , 

gue Q9», ah 
Oy ' Ol ol 


The averaging principle in the case when there are 
no resonant relations leads to the system 


x = € — 


jm By 
Hy = (2n) ” $, E, pegs my Ol de 


p =0, 
i5 


Therefore, in this case there is no drift in p, and the 
behavior of y,x is described by the Hamiltonian 
system, which contains p as a parameter. Equations 
of motion of planets around the Sun can be reduced 
to the form [4]. The issue of the absence of the 
evolution of momenta p is known in this problem as 
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the Lagrange-Laplace theorem, about the absence of 
the evolution of semimajor axes of planetary orbits. 


Elimination of Fast Variables, Decoupling 
of Slow and Fast Motions 


The basic role in the averaging method is played by 
the idea that the exact system can be in the principal 
approximation transformed into the averaged sys- 
tem by means of a transformation of variables close 
to the identical one. The extension of this idea is the 
idea that similar transformation of variables allows 
one to eliminate, up to an arbitrary degree of 
accuracy, the fast phases from the right-hand sides 
of the equations of perturbed motion and in this 
way decouple the slow motion from the fast one. 
For system [1], provided there are no resonant 
relations between frequencies, the elimination of fast 
variables is performed as follows. The desirable 
transformation of variables (1,45) — (fJ, ») is sought 
as a formal series 


I= J 4 eui(J;v) 4 e^u(J,;u) 4 --- 
e -— V4 evi Jv) + nl], p) 


where functions ujv; are 27-periodic in wv. The 
transformation [6] should be chosen in such a way 
that in the new variables the right-hand sides of 
equations of motion do not contain fast variables, 
that is, the equations of motion should have the 
form 


[6] 


J =eGo(J) +e G1(J) + 
) = w(J) + eFo(J) - & Fi(J) +++ 


Substituting [6] into [7], taking into account [1], and 
equating the terms of the same order in e, we obtain 
the following set of relations: 


[7] 


Go(J)  g(J, 9,0) 2d 
Fo(J) = f (J, 0) uL Su 

| i8 
GJ) = Xi(J,¥) - tu 


Ow OVis] . 
RU) = Viiv) + arua y w, i21 


The functions X;, Y; are uniquely determined by the 
terms 24,Uj,...,Hj,UV; in expansion [6]. The first 
equation in [8] implies that 


Go(J) = go(J) = G(J) [9] 
mY) = Sop jelik v) +u) 
k#0 : 


where gj,,k € Z”, are Fourier coefficients of func- 
tion g at £ — 0, and uw is an arbitrary function of J. It 
is assumed that the denominators in [9] do not 
vanish, and that the series in [9] converges and 
determines a smooth function. In the same way, 
from the other equations in [8] one can sequentially 
determine Fo,v1,..., Gi uj, Fi, vj, E > 1. 

On truncating the series in [6] and [7] at the terms 
of order £!, we obtain a truncated system of the Ith 
approximation. The equation for / is decoupled 
from the other equations and can be solved 
separately. Then the behavior of w is determined 
by means of quadrature. The behavior of original 
variable I in this approximation is a slow drift 
(described by the equation for J), on which small 
oscillations (described by transformation of variables) 
are superimposed. The behavior of i can be repre- 
sented as a rotation with slowly varying frequency, 
on which oscillations are also superimposed. For / — 1, 
the truncated system coincides with the averaged 
system [2]. 

If the sublattice [ CZ” specifying possible 
resonant relations is given, then in an analogous 
manner one can construct a formal transformation 
of variables (1,w) (J, y) such that, in the new 
variables, the fast phase v» will appear on the right- 
hand sides of the differential equations for the new 
variables only in combinations (k,w), with RET 
(see, e.g., Arnol'd et al. (1988)). Again, on truncat- 
ing the series on the right-hand sides of the 
differential equations for the new variables at the 
terms of order £', we obtain a truncated system of 
the /th approximation. At /=1, this truncated 
system coincides with the partially averaged system 
[3] (for some special choice of arbitrary functions 
that are contained in the formulas for transformation 
of variables). If the original system is a Hamiltonian 
system of the form [4], then the transformation of 
variables eliminating the fast phases from the right- 
hand sides of the differential equations can be 
chosen to be symplectic. The corresponding 
procedures are called “Lindstedt method" and 
“Newcomb method” (nonresonant case for m-— 1), 
“Delaunay method” (resonant case for » — 71), and 
“von Zeipel method" (resonant case for n > m) (see 
Poincaré (1957) and Arnol’d et al. (1988)). 

The calculation of high-order terms in the 
procedures of elimination of fast variables is rather 
cumbersome. There are versions of these procedures 
which are convenient for symbolic processors 
(especially for Hamiltonian systems, e.g., the 
Deprit-Hori method; Giacaglia 1972). 

The averaging method consists in using the 
averaged system for the description of motion in 
the first approximation and the truncated systems 


obtained by means of the procedures of elimination 
of fast variables in the higher approximations, 
together with the corresponding transformations of 
variables. 


Justification of the Averaging Method 


To justify the averaging method, one should estab- 
lish conditions under which the deviation of the 
slow variables along the solutions of the exact 
system from the solutions of the averaged system 
with appropriate initial data on time intervals of 
order 1/e or longer tends to 0 as & — 0. It is 
desirable to have estimates from the above for these 
deviations. The estimates of deviations of the 
solutions of the exact system from the solutions of 
the truncated systems obtained by means of the 
procedure of elimination of fast phases are impor- 
tant as well. It can happen that there are “bad” 
initial data for which the slow component of the 
solution of the exact system deviates from the 
solution of the averaged system by a value of order 
1 over time of order 1/e. In this case, one should 
have estimates from above for the measure of the set 
of such “bad” initial data; on the complementary set 
of initial data, one should have estimates from 
above for the deviation of slow variables along the 
solutions of the exact system from the solution of 
the averaged system. These problems are currently 
far from being completely solved. Some general 
results are described in the following. 

Let functions w,f, g on the right-hand side of 
system [1] be defined and bounded together with a 
sufficient number of derivatives in the domain D{I} x 
T” {4%} x [0,£0o]. Let J(t) be the solution of the 
averaged system [2] with initial condition Ip € D. 
Let (I(t), y(t)) be the solution of the exact system [1] 
with initial conditions (Ip, po). So, I(0) —7(0). It is 
assumed that the solution /(t) is defined and stays at 
a positive distance from the boundary of D on the 
time interval 0 < t < Kje, K —const > 0. 

If system [1] is a one-frequency system (m= 1), 
and the frequency w does not vanish in. D, then for 
0 € t € Kje the solution (I(t), w(t)) is well defined, 
and |I(t) — ](t)| < Ce, C= const. > 0. For w=1, this 
assertion was proved by P Fatou (1928) and, by a 
different method, by L I Mandel’shtam and L D 
Papaleksi (1934). This was historically the 
first result on the justification of the averaging 
method (Mintropol’skii 1971). There is a proof 
based on the elimination of fast variables (see, e.g., 
Arnol'd (1983)). For a one-frequency system, higher 
approximations of the procedure of elimination of 
fast variables allow the description of the dynamics 
with an accuracy of the order of any power in £ on 
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time intervals of order l/s (Bogolyubov and 
Mitropol’skii 1961). 

If system [1] is a multifrequency system (m > 2), but 
the vector of frequencies is constant and nonresonant, 
then for any p > 0 and small enough £ < eo(p) it holds 
that |/(t) — J(t)) <p for 0 €: € K/e (Bogolyubov 
1945, Bogolyubov and Mitropol’skii 1961). If, in 
addition, the frequencies satisfy the Diophantine 
condition |(k,w)| > const|k| " for all k € Z" (0) 
and some v > 0, then one can choose p= O(e). In 
this case, higher approximations of the procedure of 
elimination of fast variables allow one to describe 
the dynamics with an accuracy of the order of any 
power in € on time intervals of order 1/e (see, e.g., 
Arnol'd et al. (1988)). 

If the system is a multifrequency system, and 
frequencies are not constant (but depend on the slow 
variables I), then due to the evolution of slow 
variables the frequencies themselves are evolving 
slowly. At certain time moments, they can satisfy 
certain resonant relations. One of the phenomena 
that can take place here is a capture into a 
resonance; this capture leads to a large deviation of 
the solutions of the exact and averaged systems. 
However, the general Anosov averaging theorem 
(Anosov 1960) implies that if the frequencies w are 
nonresonant for almost all I, then for any p > 0, the 
inequality |I(t) — ](t)| < pis satisfied for 0 <t < K/e 
for all initial data outside a set E(p,c) whose 
measure tends to 0 as € — 0. In many cases, it 
turns out that mes E(p, 2) = O(,/e/p) (in particular, 
the sufficient condition for the last estimate is that 
rank(Qw/0I) =m) (Arnol'd et al. (1988)). 

The knowledge about averaging in  two- 
frequency systems (m= 2) on time intervals, of order 
of 1/e, is relatively more complete (see Arnol'd 
(1983), Arnol'd et al. (1988), and Lochak and 
Meunier (1988)) For Hamiltonian and reversible 
systems, the justification of the averaging method is 
a by-product of Kolmogorov-Arnold-Moser (KAM) 
theory. The KAM theory provides estimates of the 
difference between the solutions of the exact and 
averaged systems for majority of initial data on 
infinite time interval —oo < t < +00. For remaining 
data this difference can grow because of Arnol'd 
diffusion, but, in general, very slowly. According to 
the Nekhoroshev theorem, this difference is small on 
time intervals whose length grows exponentially when 
the perturbation decays linearly (for an analytic 
Hamiltonian if the unperturbed Hamiltonian is a 
generic function, the so-called steep function). 

Another aspect of justification of the averaging 
method is establishing relations between invariant 
manifolds of the exact and averaged systems. 
Consider, in particular, the case of a one-frequency 
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system and a multifrequency system with constant 
Diophantine frequencies. Suppose that the averaged 
system has an equilibrium such that real parts of all 
its eigenvalues are different from 0, or a limit cycle 
such that the absolute values of all but one of its 
multipliers are different from 1. Then the exact 
system has an invariant torus, respectively, m- or 
(m+ 1)-dimensional, whose projection onto the 
space of the slow variables is O(s)-close to the 
equilibrium (cycle) of the averaged system. This 
torus is stable or unstable together with the 
equilibrium (cycle) of the averaged system. For 
Hamiltonian and reversible systems, the problem of 
invariant manifolds is considered in the framework 
of the KAM theory. 


Averaging in Bogolyubov's Systems 


Systems in the standard form of Bogolyubov (1945) 

are of the form 
=X xe) xER?,O<e<1 [10] 

It is assumed that the function X, besides the usual 


smoothness conditions, satisfies the condition of 
uniform average: the limit (time average) 


T 
Xo(x) — lim = [ X(t, x, 0) dt [11] 


exists uniformly in x. The averaging principle of 
Bogolyubov consists of the replacement of the 
original system in standard form by the averaged 
system 


& — € Xo (£) [12] 
with a goal to provide an approximate description 
of the behavior of x. This approach generalizes the 
approach of the section “Averaging principle” for 
the case of constant frequencies (w= const). Upon 
introducing in the given system with constant 
frequencies the deviation from uniform rotation 
a—q- wt and denoting x=(I,a), we obtain a 
system in the standard form [10]. Here the condition 
of uniform average is fulfilled because X(t,x,0) is a 
quasiperiodic function of time f£. The averaged 
system [12] for nonresonant frequencies coincides 
with the averaged system [2]; for resonant frequen- 
cies, it coincides with the partially averaged system 
[3] (one should only supply systems [2] and [3] with 
equations for some components of the vector i — wt 
that do not enter into the right-hand side of the 
averaged system). 

The averaging principle of Bogolyubov is justified 
by three Bogolyubov theorems. According to the 


first theorem, if £(£),0 € t € K/e, is a solution of 
the averaged system, and x(t) is a solution of the 
exact system with initial condition x(0) — £(0), then 
for any p>O there exists &o(p) > 0 such that 
Ix(t) — €(t)| < p for 0 € t € K/e and 0 < e < o(p). 
The second and the third Bogolyubov theorems 
describe the motion in the neighborhoods of 
equilibria and the limit cycles of the averaged 
system. In particular, if for an equilibrium real 
parts of all its eigenvalues are different from 0, or, 
for a limit cycle, the absolute values of all but one 
multipliers are different from 1, then the exact 
system has a solution which eternally stays near 
this equilibrium (cycle). The stability properties of 
this solution are the same as the stability properties 
of the corresponding equilibrium (cycle) of the 
averaged system. 

For systems of the form [10] a procedure exists 
that, similarly to the procedure in the section 
*Elimination of fast variables, decoupling of slow 
and fast motions,” allows us to eliminate time t 
from the right-hand side of the system with an 
accuracy of the order of any power in s by means of 
a transformation of variables. (To perform this 
procedure, one should assume that the conditions 
of uniform average are satisfied for functions 
that arise in the process of constructing higher 
approximations in this procedure (Bogolyubuv and 
Mitropol'skii 1961).) In the first approximation, 
such a transformation of variables transforms the 
original system into the averaged one. 

The condition of uniform average is very impor- 
tant for theory. If the limit in [11] exists, but 
convergence is nonuniform in x, then the time 
average Xo could be, for example, a discontinuous 
function of x, and the averaged system would not be 
well defined. 


Averaging in Slow-Fast Systems 


Systems of the form [1] are particular cases of the 
systems of the form 


k—ff(xse)y y = eg(x, y, €) [13] 


which are called *slow-fast systems" (or systems 
with slow and fast motions, with slow and fast 
variables). The generalization of the approach of the 
section “Averaging principle" for these systems is 
the following averaging principle of Anosov (1960). 
In the system [6], let x € M,y € R", where M is a 
smooth compact m-dimensional manifold. At € — 0, 
the system for fast variables x contains slow 
variables y as parameters. Assume that this system 
(which is called “fast system") has a finite smooth 


invariant measure py and is ergodic for almost all 
values of y. Introduce the averaged system 


Y=eG(Y), G(Y)= À 


~ ny(M) L Ete 
According to the averaging principle, one should use 
the solution Y(t) of the averaged system with initial 
condition Y(0) — y(0) for approximate description of 
slow motion y(f) in the original system. This 
averaging principle is justified by the following 
Anosov theorem [1]: for any positive p tbe measure 
of the set E(p,&) of initial data (from a compact in 
the phase space) such that 

jo 4,1) Y(t)| > p 
tends to 0 ase — 0. 

The particular case when the original system is 
a Hamiltonian system depending on slowly vary- 
ing parameter A— et, and for almost all values of 
A the motion of the system with A=const is 
ergodic on almost all energy levels, is considered 
in Kasuga (1961). 

For the case when the has strong mixing proper- 
ties, see Bakhtin (2004) and Kifer (2004). 

For slow-fast systems, there is also a general- 
ization of approach of the previous section that uses 
time averaging and the condition of uniform average 
(Volosov 1962). 


Applications of the Averaging Method 


The averaging method is one of the most productive 
methods of perturbation theory, and its applications 
are immense. It is widely used in celestial mechanics 
and space flight dynamics for the description of the 
evolution of motions of celestial bodies, in plasma 
physics and theory of accelerators for description of 
motion of charged particles, and in radio engineer- 
ing for the description of nonlinear oscillatory 
regimes. There are also applications in hydrody- 
namics, physics of lasers, optics, acoustics, etc. (see 
Arnol'd et al. (1988), Bogolyubov and Mitropol'skii 
(1961), Lochak and Meunier (1988), Mitropol'skii 
(1971), and Volosov (1962)). 


Averaging Methods 231 


See also: Central Manifolds, Normal Forms; 
Diagrammatic Techniques in Perturbation Theory; 
Hamiltonian Systems: Stability and Instability Theory; 
KAM Theory and Celestial Mechanics; Multiscale 
Approaches; Random Walks in Random Environments; 
Separatrix Splitting; Stability Problems in Celestial 
Mechanics; Stability Theory and KAM. 
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Introduction 


The idea of topological invariants defined via path 
integrals was introduced by AS Schwartz (1977) ina 
special case and by E Witten (1988) in its full 
power. To formalize this idea, Witten (1988) 
introduced a notion of a topological quantum field 
theory (TQFT). Such theories, independent of 
Riemannian metrics, are rather rare in quantum 
physics. On the other hand, they admit a simple 
axiomatic description first suggested by M Atiyah 
(1989). This description was inspired by G Segal’s 
(1988) axioms for a two-dimensional conformal 
field theory. The axiomatic formulation of TQFTs 
makes them suitable for a purely mathematical 
research combining methods of topology, algebra, 
and mathematical physics. Several authors explored 
axiomatic foundations of TQFTs (see Quinn (1995) 
and Turaev (1994). 


Axioms of a TQFT 


An (n + 1)-dimensional TQFT (V,7) over a scalar 
field k assigns to every closed oriented z-dimen- 
sional manifold X a finite-dimensional vector space 


V(X) over k and assigns to every cobordism 
(M, X, Y) a k-linear map 


T(M) = T(M, X, Y): V(X) — V(Y) 


Here a cobordism (M, X, Y) between X and Y is a 
compact oriented (n + 1)-dimensional manifold M 
endowed with a diffeomorphism 0M ~ X H Y (the 
overline indicates the orientation reversal). All 
manifolds and cobordisms are supposed to be 
smooth. A TQFT must satisfy the following axioms. 


1. Naturality Any  orientation-preserving  diffeo- 
morphism of closed oriented n-dimensional mani- 
folds f:X— X’ induces an isomorphism fi:V 
(X) V(X'). For a diffeomorphism g between the 
cobordisms (M, X, Y) and (M', X', Y^), the follow- 
ing diagram is commutative: 


v(x) 9 v(x") 
«| | -om 
V(Y) (gv) V(Y’) 
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2. Functoriality If a  cobordism (W,X,Z) is 
obtained by gluing two cobordisms (M, X, Y) and 
(M', Y', Z) along a diffeomorphism f : Y — Y’, then 
the following diagram is commutative: 


T(W) 
v(x) 29, wz) 
„m| NT 
vy) > wy’ 


3. Normalization For any n-dimensional manifold 
X, the linear map 


r([0,1] x X) : V(X) > V(X) 


is identity. 


4. Multiplicativity There are functorial 
isomorphisms 
V(XII Y) = V(X) & V(Y) 
V(0) =k 


such that the following diagrams are commutative: 


V((X1I Y) H Z) (V(X) & V(Y)) & V(Z) 
| 


Q 


l 
V(X I (Y IH Z)) 


Q 


V(X) & (V(Y) & V(Z)) 


V(XIH0) zx V(X) &k 
| | 
V(X) = V(X) 


Here ® =, is the tensor product over k. The 
vertical maps are respectively the ones induced 
by the obvious diffeomorphisms, and the stan- 
dard isomorphisms of vector spaces. 

5. Symmetry The isomorphism 


V(XILY) ze V(Y HI X) 


induced by the obvious diffeomorphism corre- 
sponds to the standard isomorphism of vector 
spaces 


V(X) & V(Y) = V(Y) & V(X) 


Given a TQFT (V, 7), we obtain an action of the 
group of diffeomorphisms of a closed oriented 
n-dimensional manifold X on the vector space 
V(X). This action can be used to study this group. 

An important feature of a TQFT (V,7) is that it 
provides numerical invariants of compact oriented 
(n+ 1)-dimensional manifolds without boundary. 
Indeed, such a manifold M can be considered as a 
cobordism between two copies of Ø so that 7(M) € 
Hom,(k,k)=k. Any compact oriented (n+ 1)- 
dimensional manifold M can be considered as a 
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cobordism between Ø and 0M; the TQFT assigns to 
this cobordism a vector 7T(M) in Hom,(k, 
V(OM)) — V(90M) called the vacuum vector. 

The manifold [0, 1] x X, considered as a cobord- 
ism from X II X to () induces a nonsingular pairing 


V(X) @V(X)—k 
We obtain a functorial isomorphism V(X)= 
V(X) =Hom,(V(X), k). 

We now outline definitions of several important 
classes of TQFTs. 

If the scalar field k has a conjugation and all the 
vector spaces V(X) are equipped with natural 
nondegenerate Hermitian forms, then the TQFT 
(V, T) is Hermitian. If k=C is the field of complex 
numbers and the Hermitian forms are positive 
definite, then the TQFT is unitary. 

A TQFT (V,7) is nondegenerate or cobordism 
generated if for any closed oriented n-dimensional 
manifold X, the vector space V(X) is generated by 
the vacuum vectors derived as above from the 
manifolds bounded by X. 

Fix a Dedekind domain D c C. A TQFT (V,7) 
over C is almost D-integral if it is nondegenerate and 
there is d € C such that dT(M) € D for all M with 
OM =). Given an almost integral TQFT (V, 7) and a 
closed oriented n-dimensional manifold X, we define 
S(X) to be the D-submodule of V(X) generated by all 
the vacuum vectors. This module is preserved under 
the action of self-diffeomorphisms of X and yields a 
finer *arithmetic" version of V(X). 

The notion of an (n + 1)-dimensional TQFT over 
k can be reformulated in the categorical language as 
a symmetric monoidal functor from the category of 
n-manifolds and (n + 1)-cobordisms to the category 
of finite-dimensional vector spaces over k. The 
source category is called the (n+ 1)-dimensional 
cobordism category. Its objects are closed oriented 
n-dimensional manifolds. Its morphisms are cobord- 
isms considered up to the following equivalence: 
cobordisms (M, X, Y) and (M', X, Y) are equivalent 
if there is a diffeomorphism M — M' compatible 
with the diffeomorphisms OM ~ X II Y ~ OM’. 


TQFTs in Low Dimensions 


TQFTs in dimension 0+1=1 are in one-to-one 
correspondence with  finite-dimensional vector 
spaces. The correspondence goes by associating 
with a one-dimensional TQFT (V,7) the vector 
space V(pt) where pt is a point with positive 
orientation. 

Let (V,7) be a two-dimensional TQFT. The linear 
map 7 associated with a pair of pants (a 2-disk with 
two holes considered as a cobordism between two 


circles S! II S! and one circle St) defines a commu- 
tative multiplication on the vector space A= V(S'). 
The 2-disk, considered as a cobordism between S! 
and (), induces a nondegenerate trace on the algebra 
A. This makes .A into a commutative Frobenius 
algebra (also called a symmetric algebra). This 
algebra completely determines the TQFT (V,r7). 
Moreover, this construction defines a one-to-one 
correspondence between equivalence classes of two- 
dimensional TQFTs and isomorphism classes of 
finite dimensional commutative Frobenius algebras 
(Kock 2003). 

The formalism of TQFTs was to a great extent 
motivated by the three-dimensional case, specifi- 
cally, Witten's Chern-Simons TQFTs. A mathema- 
tical definition of these TQFTs was first given 
by Reshetikhin and Turaev using the theory of 
quantum groups. The Witten-Reshetikhin-Turaev 
three-dimensional TQFTs do not satisfy exactly the 
definition above: the naturality and the functoriality 
axioms only hold up to invertible scalar factors 
called framing anomalies. Such TQFTs are said to 
be projective. In order to get rid of the framing 
anomalies, one has to add extra structures on the 
three-dimensional cobordism category. Usually one 
endows surfaces X with Lagrangians (maximal 
isotropic subspaces in H4(X; R)). For 3-cobordisms, 
several competing — but essentially equivalent — 
additional structures are considered in the literature: 
2-framings (Atiyah 1989), p,-structures (Blanchet 
et al. 1995), numerical weights (K Walker, V Turaev). 

Large families of three-dimensional TQFTs are 
obtained from the, so-called modular categories. 
The latter are constructed from quantum groups at 
roots of unity or from the skein theory of links. 
See Quantum 3-Manifold Invariants. 


Additional Structures 


The axiomatic definition of a TQFT extends in 
various directions. In dimension 2 it is interesting to 
consider the so-called open-closed theories involving 
]-manifolds formed by circles and intervals and 
two-dimensional cobordisms with boundary 
(G Moore, G Segal). In dimension 3 one often 
considers cobordisms including framed links and 
graphs whose components (resp. edges) are labeled 
with objects of a certain fixed category C. In such a 
theory, surfaces are endowed with finite sets of 
points labeled with objects of C and enriched with 
tangent directions. In all dimensions one can study 
manifolds and cobordisms endowed with homotopy 
classes of mappings to a fixed space (homotopy 
quantum field theory, in the sense of Turaev). 
Additional structures on the tangent bundles — spin 
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structures, framings, etc. — may be also considered 
provided the gluing is well defined. 


See also: Braided and Modular Tensor Categories; Hopf 
Algebras and g-Deformation Quantum Groups; Indefinite 
Metric; Quantum 3-Manifold Invariants; Topological 
Gravity, Two-Dimensional; Topological Quantum Field 
Theory: Overview. 
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Introduction 


The term “axiomatic quantum field theory” sub- 
sumes a collection of research branches of quantum 
field theory analyzing the general principles of 
relativistic quantum physics. The content of the 
results typically is structural and retrospective rather 
than quantitative and predictive. 

The first axiomatic activities in quantum field theory 
date back to the 1950s, when several groups started 
investigating the notion of scattering and S-matrix in 
detail (Lehmann, Symanzik, and Zimmermann 1955 
(LSZ-approach), Bogoliubov and Parasiuk 1957, Hepp 
and Zimmermann (BPHZ-approach), Haag 1957-59 
and Ruelle 1962 (Haag-Ruelle theory) (see Scattering, 
Asymptotic Completeness and Bound States and 
Scattering in Relativistic Quantum Field Theory: 
Fundamental Concepts and Tools). 

Wightman (1956) analyzed the properties of the 
vacuum expectation values used in these approaches 
and formulated a system of axioms that the vacuum 
expectation values ought to satisfy in general. Together 
with Garding (1965), he later formulated a system of 
axioms in order to characterize general quantum fields 
in terms of operator-valued functionals, and the two 
systems have been found to be equivalent. 

A couple of spectacular theorems such as the PCT 
theorem and the spin-statistics theorem have been 
obtained in this setting, but no interacting quantum 
fields satisfying the axioms have been found so far 
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(in 1 十 3 spacetime dimensions). So, the develop- 
ment of alternatives and modifications of the setting 
got into the focus of the theory, and the axioms 
themselves became the objects of research. Their 
role as axioms — understood in the common sense — 
turned into the role of mere properties of quantum 
fields. Today, the term “axiomatic quantum field 
theory” is widely avoided for this reason. 

In a long list of publications spread over the 
1960s, Araki, Borchers, Haag, Kastler, and others 
worked out an algebraic approach to quantum field 
theory in the spirit of Segal’s “postulates for general 
quantum Mechanics” (1947) (see Algebraic Approach 
to Quantum Field Theory). 

The Wightman setting was the basis of a frame- 
work into which the causal construction of the 
S-matrix developed by Stiickelberg (1951) and 
Bogoliubov and Shirkov (1959) has been fitted by 
Epstein and Glaser (1973). The causality principle 
fixes the time-ordered products up to a finite 
number of parameters at each order, which are to 
be put in as the renormalization constants. 

Already in 1949, Dyson had seen that problems in 
the formulation of quantum electrodynamics (QED) 
could be avoided by “just” multiplying the time 
variable and, correspondingly, the energy variable by 
the imaginary unit constant (“Wick rotation”). Schwin- 
ger then investigated time-ordered Green functions of 
QED in this Euclidean setting. This approach was 
formulated in terms of axioms by Osterwalder and 
Schrader (1973, 1975) (see Euclidean Field 
Theory). 

Other extensions of the aforementioned settings 
are objects of current research (see Indefinite Metric, 


Quantum Field Theory in Curved Spacetime, 
Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions, and Thermal Quantum Field 
Theory). 


Quantum Fields 


Garding and Wightman characterized operator- 
valued quantum fields on the Minkowski spacetime 
R' by a couple of axioms. Given additional 
assumptions concerning the high-energy behavior, 
the Garding—Wightman fields are in one-one corre- 
spondence with algebraic field theories. 

Without specifying or presupposing these addi- 
tional assumptions, the axioms will now be for- 
mulated and discussed in detail and compared to the 
corresponding conditions in the algebraic setting. 
Adjoint operators are marked by an asterisk, and 
Einstein’s summation convention is used. 


Operator-valued functionals The components of a 
field F are an n-tuple F,---F,, of linear maps that 
assign to each test function € Cy (R^?) linear 
operators F,(y)---F,(y) in a Hilbert space H with 
domains of definition D(F\(y))---D(F,(y)). There 
exists a dense subspace D of H with 
DC D(F(¢)) ND(F,(y) ) and F,(y)DUF,(p)"D CD 
for all indices v. Consider m such fields F!...F" 
with components F?^,1 «am, 1<v< na. Assume 
there to be an involution *:(1---m) — (1--- m) such 
that F^ (o) = Fip)", where p(x) := p(x). 


Quantum fields cannot be operator-valued func- 
tions on R^? if one wants them to exhibit (part of) 
the properties to follow. But point fields can be 
quadratic forms; typically this is the case for fields in 
a Fock space. 

For each component F7 and each open region 
O C R! €, the field operators Fa(w) with supp o CO 
generate a *-algebra F7(O) of operators defined on 
D. These operators typically are unbounded, which 
is one of the differences with the traditional setting 
of the algebraic approach. There a C*-algebra A(O) 
is assigned to each open region Ó in such a way 
that OCP implies WO) C2(P). Each C'"-algebra 
is a '-algebra, but in contrast to a C’-algebra, 
a'-algebra does not need to be endowed with a 
norm. The fundamental observables in quantum 
theory are bounded positive operators (typically, but 
not always, projections), and these generate a C'- 
algebra. 

There is no fundamental physical motivation for 
confining the setting to fields with a finite number of 
components, except that it includes most of the 
fields known from “daily life.” 
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Continuity as a distribution For all ^, v c D, the 
linear functionals Ts y,» on Coins) defined by 


T^ a (9) = (9, File) V) 


are distributions. Tbey can be extended to tempered 
distributions. 


The Fourier transform of a tempered distribution 
is well defined as a tempered distribution. It is 
mainly due to the importance of Fourier transforma- 
tions that the preceding assumption is convenient. 
Bogoliubov et al. (1975) remark that the assumption 
is not a mere technicality, since it rules out 
nonrenormalizable quantum fields. 


Microcausality (Bose-Fermi alternative) If and wv 
are test functions witb spacelike separated support, 
then 


Fo(p)F (Wp = + FEE l)o- 


The sign depends on the statistics of the fields, it 
is *—" if and only if both F^ and F^ are fermion 
fields. 

Microcausality is closely related to Einstein 
causality. Einstein causality requires that any two 
observables located in spacelike separated regions 
commute in the strong sense, that is, their spectral 
measures commute. But fields with Fermi-Dirac 
statistics are not observables, and not even for Bose- 
Einstein fields with self-adjoint field operators does 
the above condition imply that the spectral projec- 
tions commute, which is the criterion for commen- 
surability. The sign on the right-hand side does, 
however, specify the statistics of the field. 

This is a crucial difference with the algebraic 
approach. If O and P are spacelike separated open 
regions and if Ac9((O) and BEAP), then one 
assumes, like in the above case, that AB— BA 
(locality). But being elements of C*-algebras, A and 
B are bounded operators (or can be represented 
accordingly), so if A and B are self-adjoint, they are, 
indeed, commensurable. 

Doplicher, Haag, and Roberts (1974) and Buch- 
holz and Fredenhagen (1984) have derived from this 
input of observables a field structure of localized 
particle states, and they showed that the statistics of 
these fields is Bose-Einstein, Fermi—Dirac, or some 
corresponding parastatistics (which is, a priori, 
forbidden if one assumes microcausality). 

Recall that the unimodular group SL(2,C) is 
isomorphic to the universal covering group of 
the restricted Lorentz group 工 (the connected 
component containing the unit element). Denote by 
A : SL(2; C) > Li a covering map. 


* 
i" 
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Covariance There exist strongly continuous uni- 
tary representations U and T of SL(2,C) and 
(Rl?,-L), respectively, and representations 

|...D" of SL(2,C) in C" ...C"", respectively, 
such that 


U(g)F;(o)U(g)' = D*(g"')? 


and 


F (e(A(g) *-)) 


T(y)E/(g)TQ)' = Fi (e — »)). 


where D^(g!)' are the elements of the matrix 


Vv 


D*(g"!). Dropping coordinate indices, this reads 
U(g)F*(p)U(g)" = D'(g ')F'(e(A(g) )) 


and 


T(y)F*(y)T(y)" = F*(¢(- — y)). 


The representations U and T generate a representa- 
tion of the universal covering of the restricted 
Poincaré group. 


As it stands, this assumption is a very strong one, 
since it manifestly fixes the action of the representa- 
tion on the field operators. In the algebraic 
approach, the covariance assumption is more mod- 
estly formulated. Namely, it is assumed that 
U(g)2(O)U(g)" =A(A(g)O) and T(y)&(O)T(y) = 
AlO + y), leaving open how the representation acts 
on the single local observables. 


Vacuum vector There exists a unique (up to a 
multiple) vector QED that is invariant under the 
representations U and T and cyclic with respect to 
the algebra F(R'**) generated by all field operators 


F(p), that is, F(R'*?)Q=H. 


Spectrum condition The joint spectrum of the 
components of the 4-momentum, i.e., of the gen- 
erators of the spacetime translations, has support in 
the closed forward light cone V, that is, the set 
(&^ > 0, ko > 0]. 


The existence of an invariant ground state called 
the vacuum is standard in algebraic quantum field 
theory as well. 


N-Point Functions 


Consider the above fields F! - - 
and each N-tuple (a; - - 


. F”. For each NEN 
-an) of natural numbers < m 


(labeling fields), define families (F^ "?v):— 
(Pa le Sis and vd AN te (105 7 IN Justin, of dis- 


tributions on (R'^?)N by 


Fi) ON (91 @ Den): =F (v1) «++ FN (pn) 


(using the nuclear theorem) and 
Wy on (V) = (Q, Pt (W). " 


These distributions are called the “N-point func- 
tions” of the fields F'---F” and yield the vacuum 
expectation values of the theory. It is straightfor- 
ward to deduce the following properties from the 
Garding—Wightman axioms. 


Microcausality (Bose-Fermi alternative) If p; and 
pi+1 have spacelike separated supports, then 


WA hia ON (1 O +++ BY; 89ia9 OAN) 

= dup Mao (oi ®@ --- Gia @ Yj @ DN). 
or dropping coordinate indices, 
thi HAN oo --- WY; @ Yi41 9 99) 


— pu, 


8,414; TEST 


19 DpH BPO G uu). 


Invariance For all g € SL(2, C) and y € R'?, one bas 


"= Die" p 5 eee 


HN 


X Ww (A(g)p1 9-0 A(g)pn) 


= Wh ux MC») @--- @pn(- — y)) 

or dropping coordinate indices, 
wN (uo @ +++ @ pn) 
= (D^(g !) @--- @ D^"(g !)) 
x w^ ""N(A(g)yu @ --- @ A(g)eN) 

= uh "(uoi — y) 8--- 8 pnl — y)). 
By translation invariance, the N-point functions 
wi” be (xı ---xn) only depend on the N — 1 relative- 


position vectors 上 := X1 — X3, & := X2 —43,... 
£N-1 := XN, — XN. This means that there are diei 
tions W's on (R'*5)N7! related to the N-point 


functions by the symbolic condition 
Win X17 XN) = Wow GL 6N-1)- 


V1 UN 
In precise notation, this reads 


| fedi 
"ub. ig | =f. 
where 


Qeler Sia) SH xu — £u £1 — S02. 5 — Gi 
i scl ae I 


The functions WAN are called the Wightman 
functions, and they have the following property 


because of the spectrum condition of the field. 


war aN 
p, ba | dx, 


Spectrum condition The support of tbe Fourier 
transform of each W^? is contained in (V4). 


UN 


The uniqueness of the vacuum vector (up to a 
phase) is equivalent to the following condition. 


Cluster property For N > 2, let x be a spacelike 
vector in R'*3, let L be a natural number <N, and 
let p and v» be tempered test functions on (R'?)* 
and (R'*5)^-.. respectively. tben 


lim wa (p @ v — Ax)) 


0cX—oo VA. PN 
— Wu PW oy (Y)- 

On the one hand, these properties have been 
deduced from the Garding—Wightman axioms via 
eqn [1]. Conversely, a family of distributions 
labeled in the above fashion and satisfying the 
above properties may be used to construct a 
Garding—Wightman field theory provided that two 
more conditions — which hold for all systems of 
N-point functions — are satisfied. This requires 
some elementary notation. 

Define the index sets 


A a 
Ty = |1Sa;&m,l&v n, 


et = LAN 


for all E NEN 


To :={0}, and Z:= Unen, Zn. On Z a concatena- 
tion o is defined by 


(scm) (Bn) = (mb 
V1 -++ VN H1 -BMJ \ Me UN Da 71 HM 
and 


0o&:—&koU:—k 


and an involution * by 


E : 
da d di. ss 

( . =| N t and 0* :— 9. 
Vi ++- VN DAR =i 


Define an antilinear involution « on SN:= 


S(R'3)N) by 


V(x1 +++ XN) = V(xu «++ x1) 


for each NEN. Put S°:=C and z':—z for all 
zi 

Define Sn :=S™ x Zu, and S := (J), S*™. For 
each &€Zw, the set S^ = SR )N) x {x} is a 
linear space. On the direct sum B’ := (D... S^ 
define an associative product by 


KET 


(V, K)(x, A) := (V @ x, oA) 
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and an antilinear involution * by (y, s)" := (v*, &*). 
This endows B7 with the structure of a nonabelian 
*-algebra with unit element 1—(1,0) (Borchers 
algebra). 

If one defines Fj(z) :— z1, then z:w(z) —^z, and the 
Wightman functions induce a C-linear functional w 
on B. by 


wb, r) := wy 5) [2] 


w exhibits the following two properties, which are 
the’ announced additional conditions required for 
reconstructing the fields from the N-point functions. 


Hermiticity w(£*) = o£). 
Positivity w(E*£) > 0. 
To see Hermiticity, compute 
wy", K") = (Q, Fre (Y") 2) 
= (Fx ()Q, 2) = u(v, &) 


and use C-linearity to prove the statement for 
arbitrary £ € B. For positivity, write any € as a finite 
sum £ — (i4, &1) 十 … + (Vm, Ky), and compute 


M 
w(£'£) =w (Xo. hi) (Vj; =) 


i j=l 


= xw & Vj, Kj o «)) 


ij 


= 2. UU y ow; (vj 8 Wi) 


1] 


= (Q, Fox; (vj @ hy) Q) 
=) (Q, Fr (Uj) Fx, (Yj) Q) 


= 》 (Fe (Wi)Q, Fr, (v) 


Theorem 1 (Wightman’s reconstruction theorem). 
Let m and mj--:n, be natural numbers, let 
To,Z1,Z22,..., and T be tbe above index sets, and 
let B. be the above Borchers algebra. Let D --- D,, 
be matrix representations of SL(2, C) in C" ... C", 
respectively. 

For each natural number N, let (w,),-7, be a 
family of distributions on (R"?)N. Suppose the 
family (w,), cz defined this way satisfies microcaus- 
ality, covariance, spectrum condition, and the 
cluster property. lf tbe linear functional w defined 
on B. by eqn [2] is Hermitian and positive, then 


2 
= 0. 


2. F,(U;)€ 
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there is (up to unitary equivalence) a unique family 
F! ... F” of Gárding-Wigbtman fields with nı : :nm 
components such that eqn |1] holds. 


The proof uses the GNS construction known from 
the theory of operator algebras. The Borchers 
algebra plays several roles. On the one hand, it is a 
linear space with an inner product. The Hilbert 
space H and the invariant space D of the field theory 
are constructed from this structure. On the other 
hand, the Borchers algebra acts on itself as an 
algebra of linear operators by its own algebra 
multiplication. This is the structure the *-algebra of 
field operators is constructed from. 


Results 


The mathematical and structural analysis of quan- 
tum fields has improved the understanding of 
scattering theory in the different approaches men- 
tioned above; see Bogoliubov et al. (1975) and the 
relevant articles in this encyclopedia. Apart from 
this, the following results deserve to be mentioned. 
Evidently, many others have to be omitted for 
practical reasons. 


PCT Symmetry 


An early famous result was Lüders's proof (1957) 
that all fields in the above setting exhibit PCT 
symmetry, that is, the symmetry under reflections in 
all space and time variables combined with a charge 
conjugation. This symmetry is exhibited by all 
particle reactions observed so far. The proof, like 
several of the main results, made extensive use of the 
fact that the N-point functions are boundary values 
of analytic functions due to the spectrum condition, 
and that a fundamental theorem by Bargmann, Hall, 
and Wightman (1957) yields invariant analytic 
extensions. 


Reeh-Schlieder Theorem 


For each field F? and each bounded open region 
O C R' 5, the vacuum vector is cyclic with respect 
to 7(O) (Reeh and Schlieder 1961). So excitations 
of the vacuum vector by field operators located in O 
are not to be considered as state vectors of a particle 
localized in Ó, since they are not perpendicular to 
the excitations by field operators located outside O. 


Unruh Effect and Modular P,CT Symmetry 


In the 1970s, Bisognano and Wichmann (1975, 1976) 
discovered a surprising link of symmetries to the 
intrinsic algebraic structure of quantum fields, which is 
established by the Tomita-Takesaki modular theory 
(see Tomita-Takesaki Modular Theory). Namely, the 


unitary operators implementing the Lorentz boosts on 
the fields are elements of modular groups. This means 
that a uniformly accelerated observer perceives the 
vacuum as a thermal state with a temperature 
proportional to its acceleration, corresponding to the 
famous Unruh effect. 

In addition, it was shown that PCT symmetries 
(i.e., PCT combined with rotations by the angle 7) are 
implemented by modular conjugations (modular P; CT 
symmetry). Modular P; CT symmetry is a consequence 
of the Unruh effect (Guido and Longo 1995). 


Spin and Statistics 


Immediately following Lüders's PCT theorem, the 
spin-statistics theorem: was proved for the N-point 
functions of the Wightman setting (Lüders and 
Zumino 1958, Burgoyne 1958, Dell'Antonio 1961). 
This was a remarkable and widely acknowledged 
progress. But as remarked earlier, the confinement to 
finite-component fields, which is used in the proof, 
cannot be motivated by physical first principles (i.e., in 
a truly axiomatic fashion). The representation D of 
SL(2, C) acting on the components, however, is forced 
to be finite dimensional by this assumption, and since 
the representations D^ are objects of investigation, a 
considerable part of the result is assumed this way 
from the outset. Even more so, there are examples of 
fields with a “wrong” spin-statistics connection and 
infinitely many components. 

This was one reason to continue working on the 
subject. At the beginning of the 1990s, it was found 
that the spin-statistics theorem can be derived from 
the symmetries discovered by Bisognano and Wich- 
mann, and Unruh. Two approaches not referring to 
the number of internal degrees of freedom have been 
worked out: one assumes the Unruh effect (Guido 
and Longo 1995), the other modular P; CT symme- 
try (Kuckert 1995, 2005, Kuckert and Lorenzen 
2005). The first approach has been generalized to 
conformal fields, the second to the case that the 
symmetry group's homogeneous part is not SL(2, C), 
but only SU(2). 

Both approaches can be applied to infinite- 
component fields. They yield existence theorems; a 
distinguished representation is constructed from the 
modular symmetries, and this representation exhib- 
its Pauli’s spin-statistics connection. As mentioned 
before, nothing more can be expected at this level of 
generality. The line of argument works in both the 
algebraic and the Wightman setting. 


A Dynamical Property of the Vacuum 


One can derive the spectrum condition, the Bisog- 
nano-Wichmann symmetries/the Unruh effect, and 


covariance from the condition that no (inertial or) 
uniformly accelerated observer can extract mechan- 
ical energy from the field in vacuo by means of a 
cyclic process (Kuckert 2002). 


Interacting Fields 


The examples of interacting quantum fields that fit 
into the above settings live in one or two spatial 
dimensions only, and their relevance for physics 
mainly consists in being such examples. This 
has contributed to some frustration and to doubts 
on whether one is not, in fact, proving theorems on 
pretty empty sets, or in other words, working on 
“the most sophisticated theory of the free field.” 

The computations in quantum field theory are, like 
most of the computations in physics, perturbative. In 
order to be successful, they need to yield good 
agreement with experiment with reasonable compu- 
tational efforts, that is, by evolution up to the second 
or third order. This asymptotic convergence is more 
important than convergence of the series as a whole. 
There are low-dimensional examples of interacting 
Wightman fields (e.g., (*)5; cf. the monograph by 
Glimm and Jaffe (1987)), and time will tell whether 
four-dimensional interacting Wightman fields exist. 
But there is no reason to expect convergence for 
general interacting fields; for example, QED does not 
fit into the Wightman framework. 

The appropriate extension of the Wightman 
setting has been formulated by Epstein and Glaser 
(1973). It defines the S-matrix rather than the field 
itself as a (in general divergent) formal power series 
of operator-valued distributions. 

The above results apply to this somewhat more 
modest setting as well, so the “axiomatic” 
approaches do help in understanding the known 
high-energy physics interactions. This even includes 
gauge theories (see Perturbative Renormalization 
Theory and BRST). The high-precision results of 
QED can be reproduced within this setting, and 
there occur no UV singularities: renormalization 
amounts to the need to extend distributions by 
fixing some parameters, that is, the renormalization 
constants. The infrared problem is circumvented by 
considering the S-matrix as a (position-dependent) 
distribution taking values in the unitary formal 
power series of distributions rather than as a single 
(global) unitary operator (or unitary power series). 


Quantum Energy Inequalities 


Energy densities of Wightman fields admit negative 
expectation values (Epstein, Glaser, and Jaffe 1965). 
This is in contrast to the positivity conditions that 
the energy-momentum tensors of classical general 
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(and, hence, also special) relativity have to satisfy to 
ensure causality. But the conflict can be solved by 
smearing the densities out in space or time, as has 
first been realized by Ford (1991). The extent to 
which the energy density can become negative 
depends on the extent to which it is smeared out: 
“more smearing means less violation of positivity," 
so the classical positivity conditions are restored at 
medium and large scales. There are many ways to 
make this principle concrete. Quantum energy 
inequalities hold for thermodynamically well- 
behaved quantum fields on causally well-behaved 
classical spacetime backgrounds. 
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Introduction 


Backlund transformations appeared for the first time 
in the work of the geometers of the end of the 
nineteenth century, for instance, Bianchi, Lie, 
Backlund, and Darboux, when studying surfaces 
of constant curvature. If on a surface in three- 
dimensional Euclidean space, the asymptotic direc- 
tions are taken as coordinate directions, then the 
surface metric may be written as 


ds? = dx? + 2 cos(w) dx dy + dy? [1] 


where w(x,y) is a function of the surface coordi- 
nates x,y. A necessary and sufficient condition for 
the surface to be of constant curvature is that w 
satisfies the nonlinear partial differential equation 


W xy = sin(w) n 


where the subscript denotes partial derivative. 
Equation [2] is nowadays called the sine Gordon 
(sG) equation. Bianchi (1879), Lie (1888, 1890, 
1893), and Backlund (1874) introduced a transfor- 
mation which allows one to pass from a solution of 
eqn [2] to a new solution, that is, from a surface of 
constant curvature to a new one. Starting from the 
work of Clarin (1903), this transformation has been 
referred to as Backlund transformation (BT). The 
BT for eqn [2] reads 

e) — pa 


. 2 . [10 —1w 
wy = -wy +7 sin( 5 ) [3b] 


" . (wt 
Wx = Wx +2asin 


where a is a nonzero constant parameter and w is a 
different solution of eqn [2]. It is immediate to prove 
by appropriate differentiation of eqns [3] with 
respect to y and x that both w and w must satisfy 
eqn [2]. The BT [3] provides a denumerable set of 
exact solutions once a solution w is known. Bianchi 


shówed that four such solutions can be related in an 
algebraic way: 


tU —w a; -- 42 w — Ù 
t —————t 4 
REXL. mur di. 
Equation [4] is derived using the permutability 


theorem proved by Bianchi in his Ph.D. thesis in 
1879: 


al 


do 


whereby the diagram 


a , 
Ww Ww 


we mean a BT from w to w’ with parameter a. 
For sG equation [2] a trivial solution is given, for 
example, by w(x, y)=7. Then, from eqn [3a] we get 


»L. Leo) 


w(x, y) = 2 arcsin amc 


Introducing this result in eqn [3b], we get ay = —1/a. 
So, the application of the BT [3] to sG equation gives 
the nontrivial solution 


-[ax-y/ 
iU — 4 arctan pue cure [6| 


1 + e-14x-y/a] 


Clarin (1903) extended the results of Backlund to 
the case of a generic partial differential equation of 
second order, 


F(x, y, w, W x, W y, W xx, W xy, W yy) = 0 [7] 


by assuming that 


[8] 
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If the compatibility of eqns [8] 
fy —&x =0 [9| 


is identically satisfied by eqn [7] for the variable 
w(x,y), then we say that eqns [8] are an 
auto-Backlund transformation for eqn [7]. In this 
case, eqns [8] transform a solution of eqn [7] into a 
new solution of the same equation. Thus, eqns [8] 
simplify the problem of finding solutions of eqn [7]. 
Given one solution w(x, y) of eqn [7], the existence 
of a BT reduces the problem of integrating eqn [7] 
into that of solving two first-order ordinary differ- 
ential equations. From this point of view, the 
Cauchy- Riemann relations 


Ux = 1h gy, iy = Wy [10] 
for the Laplace equation 
Wxx + Wyy = 0 [11] 


are a BT ante litteram (however, without a free 
parameter). 

Consider the case when w(x,y) satisfies a different 
partial differential equation, 


G(x,y,W,Wx,Wy,Wxx,Wxy,Wyy)=0 — [12] 


In this case, one still has a BT, but not an auto-BT. 
The best-known cases are when Fy =W y +W xxx + 
ww x and Gy=Wy+ Ù xxx + WW, and Fy = wy 一 
e" and G5 =t xy (Lamb 1976). In the first case, the 
BT relates the Korteweg-de Vries (KdV) equation to 
the modified KdV equation and this transformation 
paved the way to the discovery of the complete 
integrability of the KdV equation by Gardner et al. 
(1967). In the second case, the BT relates the 
Liouville equation to the wave equation, and can 
be used to solve it completely. Due to the first 
example, often a non-auto-BT is denoted as Miura 
transformation. 

One can now state an operative definition of BT, 
extending the results of Backlund and Clarin to 
more general equations. 


Definition 1 Consider two partial differential 
equations of order m and mp: 
Fa u, Hy Bye, RY SO 13 
Ne fe Mon E) ae 
Fix ü u, u, k )=90 [13b] 


where x € R” and (u,ù) € C^, and u is the set of 
k-order derivative of u. The set of n equations 


EFE E W, Ws. SE, Ru) =0 
(1) (1) (s2) 


fez "B OD [14] 


with sı < mı and s < mo, represents the BT of 
eqns [13] iff the compatibility of eqns [14] is 
identically satisfied on the solutions of eqns [13] 
and G; depends on a set of essential arbitrary 
constant parameters. 


The Clarin formulation [8] and the classical BT 
for the sG [3] are clearly special subcases of this 
definition. When a solution of F4 —0 is known, a 
solution of F;—0 is obtained by solving a set of 
lower-order partial differential equations. By a 
proper choice of the BT parameters, once a new 
solution is obtained by solving the BT [14], one can 
use the obtained solution as a starting point to 
construct another one, and so on. In this way, one 
can construct a whole ladder of solutions, a priori a 
denumerable set of solutions. This same construc- 
tion has been applied also to the case of functional 
equations. In particular, it has been considered for 
the case of differential-difference and difference- 
difference equations both for finite (dynamical 
systems (Wojciechowski 1982)) and infinite lattices 
(Toda 1989). 

In the case when F; and Fz represent the same 
equation, sı — s? = 1 and the BTs G; — 0 are linear in 
4, then Definition 1 is strictly related to the notion 
of nonclassical symmetry or conditional symmetry 
(Levi and Winternitz 1989, Olver 1993), an exten- 
sion of the concept of Lie symmetry used to reduce 
and integrate a differential equation. In the case of 
the nonclassical symmetries, the known solution z is 
included in the arbitrary x-dependent coefficients of 
the transformation. In this case, the BT is just a way 
to construct an explicit solution of the differential 
equation [7]. 

Definition 1 is often too general to be able to get 
explicit results. It is constructive for any partial 
differential equation, linear or nonlinear, but if one 
is not able to get a nontrivial BT this does not 
mean that a BT does not exist. As noted later, the 
existence of an auto-BT is associated to the 
existence of an infinity of symmetries, and this is 
a condition for the exact integrability of eqn [13] 
(Fokas 1980, Ibragimov and Shabat 1980). So, the 
existence of a BT is closely related to the integr- 
ability of eqn [13]. 


Backlund via Integrability 


One can derive the BT from the integrability 
properties of eqn [13a]. Equation [13a] is said to 
be integrable if it can be written as the compatibility 
condition of an overdetermined system of linear 
partial differential equations for an auxiliary func- 
tion depending on a free parameter belonging to the 


complex C plane. The prototype of such a situation 
is given by the Lax pair for the KdV equation 


Ut + xxx — 6uu, = 0 [15] 
introduced by Lax (1968): 


Ly — kh, L=—0 + u(x,t) [16a] 


V = —My, 


where k is a free parameter and v = v(x, t; k). As eqn 
[16a] is nothing else but the stationary Schrödinger 
equation, the function can be interpreted as a 
wave function, and k? is the spectral parameter 
corresponding to the potential u(x,t). The condition 
for the existence of a solution :w of the over- 
determined system of eqns [16] is given by the 
operator equation 


M = 4ôyxx —3(u0, + 0,u) [16b] 


L, = [L, M] [17] 


the so-called Lax equation. In the case of 
asymptotically bounded potentials, eqn [16a] 
defines the spectrum unique. Introducing the 
following asymptotic boundary conditions for the 
wave function 41), 


hla tik) — T(k, t)e ^ 
ee : [18] 
w(x, t; k) = e '^* + R(k, t)e'^* 


where R(k,t) and T(k,t) are, respectively, the 
reflection and the transmission coefficient, the 
spectrum is defined in the complex plane of 
the variable k by 


S{u] ={R(k,t), —oo < k < oo; Pn, cy (t), 
j212,...,N1 [19] 


where p, are the bound state parameters corre- 
sponding to isolated singularities of the reflection 
coefficients on the imaginary positive k-axis corre- 
sponding to a solution $,(x,t;p,) of the spectral 
problem vanishing for x — —oo and such that 

im. leet E ba] = 1 [20] 
and c, are some functions of t related to the residues 
of R(k,t) at the poles p,. There is a one-to-one 
correspondence between the evolution of the poten- 
tial u(x,t) in eqn [15] and that of the spectrum S[u] 
of the Schródinger spectral problem [16a]. In parti- 
cular, for the KdV, taking into account eqn [16b], 
the evolution of the reflection coefficient R(k,t) is 
given by 


dR(k, t) 
dt 


= Bik? R(k, t) [21] 
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In eqn [21] and henceforth, d/dt denotes the total 
derivative with respect to f. 

In the following, for the sake of the simplicity 
of exposition and for the concreteness of the 
presentation, all the results presented on the BT 
will be derived for the KdV equation. Similar 
results can be obtained and have been obtained in 
the literature for many classes of integrable 
partial differential equations in two and three 
dimensions and for differential-difference and 
difference-difference equations. For a partial 
review of the available recent literature on 
the subject, see Rogers and Shadwick (1982) and 
Coley et al. (2001) 

A more general form of introducing the non- 
linear partial differential equation as a compat- 
ibility of an overdetermined system of linear 
equations has been provided by Zaharov and 
Shabat (1979) with the dressing method (DM). In 
the DM, the differential equations [16] are 
substituted by a matrix system of linear equations 


V. = U(u(x,t), k)V [22a] 


V , = V(u(x,t), k)V [22b| 
where V-— V(x,t;k) and U and V are matrix 
functions. The existence of a nonsingular solution 
of the system of linear equations [22] requires 
that the matrix functions U and V satisfy the 
equation 


U,— V, 4 [U, V] 20 [23] 


often called zero-curvature condition. The KdV 
equation [15] in the DM is obtained by choosing 


U (u(x, t), k) = b uix, 9 


1 —ik 
V (u(x,t), k) 
i 2u + 4k? —u, — Ziku — 4i? 
7 a +2iku+4ik? 2u(u--2k?^) —2iku, 一 =| 


i24] 


The existence of an auto-BT implies the existence 
of a differential equation (see Definition 1) which 
relates two solutions of the same nonlinear equa- 
tion. The new solution Z(x,t) of eqn [15] will be 
associated to a different Lax operator and a 
different spectral problem (but of the same opera- 
tional form) 


L = —0,; + i(x, t) [25a] 


Li = k^v [25b] 
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The existence of a relation between the potentials 
u(x,t) and (x,t) thus implies that there must be a 
(u, ài; k)-dependent operator D such that 


p = Dw [26] 


The compatibility of eqns [16a], [25b], and [26] 
implies that LDy = DE*v, that is, 


LD = DL [27] 


Equation [27] is the auto-BT in the Lax formalism. 
If L and L are two different spectral problems 
related to two different nonlinear partial differential 
equations, then eqn [27] will provide a Miura 
transformation. In the DM, the requirement of the 
existence of a BT is given again by eqn [26] with v 
and } substituted by 更 and V and the operator D 
substituted by a matrix function D. The BT in the 
DM is given by 


D, = U(u(x,t),k)D — DU(u(x,t), k) [28a] 


D, = V(u(x, t),k)D — DV (u(x,t), k) [28b] 


In the particular case of the Hilbert-Riemann 
problem with zeros, providing the soliton solutions, 
the matrix D can be expressed as a function of Y. In 
this way, one derives the Moutard or Darboux 
transformation (DT) (Moutard 1878, Levi et al. 
1984), the most efficient way to get soliton solutions 
of the nonlinear partial differential equation. 

Given a linear ordinary differential equation for 
the unknown y, depending on a set of arbitrary 
functions u(x) and parameters k, the DT provides a 
discrete transformation which leaves the equation 
invariant. In the particular case of the KdV equation 
associated with the stationary Schródinger spectral 
problem [16a], we have 


u(x,t) = u(x,t) = 2(log F(x,t)) ,, [29a] 
(x, t3k) = — pap ies k) 
F(x, t) 
一 F(x.) w(x, t; k) [29b] 


where the intermediate wave function 
F(x,t) = v(x,t;k = ip) +ay(x,t;k = —ip) 


is a linear combination of the Jost solution of the 
Schródinger spectral problem with p a real para- 
meter-and a an arbitrary constant. If one looks for 
an equation involving only the potentials u and i, 
from eqns [29], one gets the BT for the KdV 
equation. Given a trivial solution of the KdV 
equation, together with the corresponding solution 


of the spectral problem, eqn [29a] provides a new 
solution of the KdV, while eqn [29b] gives a new 
solution of the spectral problem. This procedure can 
be carried out recursively and gives a ladder of 
explicit solutions for the KdV equation. 

The DM is a particularly simple setting in which 
one can derive DTs. In fact, expressing the matrix 
D in terms of VV, eqn [28a] gives a relation between 
the potentials of the type given by eqn [29a], while 
eqn [26] gives eqn [29b]. Depending on the form of 
the matrix D in terms of k, one can introduce more 
parameters in the DT. The classical DT [29] 
depends on just one parameter; however, in the 
case of the Schrödinger spectral problem [16a], one 
can also have DTs depending on two parameters, a 
TDT. 

A more general DT, which can provide solutions 
even when the initial solution is not bounded 
asymptotically, can be obtained for many equations 
and, in particular, also for the KdV equation. This is 
obtained in a particular limit of the TDT when the 
parameters coincide (Levi 1988) and it is often 
referred to as binary DT (Matveev and Salle 1991). 
The binary DT for the KdV is given by 


u(x,t) = u(x,t) — 2(log F(x,t)) .. [30a] 


7 1 F(x,t) - 
(x, t;k) = wa G =" uen) Vx, t; R) 


30b] 


where u is a value of k for which the function 
w(x,t;k) is asymptotically bounded at +00 and the 
function F(x,t) is given by 
'-Foo 

F(x,t)=1+p]  w(tu)dy [81] 
with p an arbitrary constant. The corresponding BT 
obtained eliminating the function F from eqns [30] 
reads 


" i a 
q xx — dxx = -314 t " 


= [Gx + qx — 2g(x) + 2u](q — q) 
1 (qx — au 
二 一 一 一 一 一 一 32. 
i 3-5 [32] 


where q= [. wo(y,t) dy with  uo(x,t) —u(x,t) — 
g(x), the asymptotically bounded part of u(x,t), 
and g(x) its asymptotic behavior, and 
G = | Ho(y,t) dy with o(x, t) = a(x, t) — g(x). 

Once the Lax operator L is given, we can obtain 
in a constructive way the operators M which 
give the admissible nonlinear partial differential 


equations and the operators D which give the 
admissible BT. A technique to do so is provided by 
the so-called Lax technique introduced by Bruschi 
and Ragnisco (1980a-c). Using the Lax technique, 
we can easily obtain the nonlinear partial differ- 
ential equations and BT associated with the Lax 
operator [16a] both in the isospectral and non- 
isospectral case (when &,—0 and when £, #0) 
and the corresponding evolution of the spectrum. 
We have 


u,=f(L,t)ux + g(L, t)[xu, + 2u] [33a] 
ky = kg(—4k”, t) 


33b 
—— — 2ikf (—Ak?, t) R(k, t) P 


F(A)(à — 4) + G(A)P 12 0 [33c] 


F(—4k2) — 2ikG(—4k?) 


R(k, t) = LARD RG AR 


R(k,t) [33d] 


where the functions f,g, F, and G are entire 
functions of their first argument and the recursive 
operators £ and A are given by 


£f (x) = f xx (x) m dites DIG) 
+ 2u (x, t) f (y) dy [34a] 


dX 


Af (x) = fxx(x 


) - 2[i(x, t) + u(x, t)]f (x) 
+r E f (y) dy [34b] 


Df (x) = [č x(x, x(x, t)Jf (x 
x T" äly, t) — u(y, D]f (y) dy [34c] 


+ [ù(x, t) — u(x, t)| 


In the limit when 4 — u the operator A — £C. A BT 
is obtained by choosing the functions F and G in 
eqn [33c]. The simplest BT is obtained by setting 
F=o and G—1: 


Ux--vx t (v—v)|o — 3(? -v= 0 [35] 


with u(x,t)= —v4(x,t) and o is the Backlund 
parameter. By combining together BT of the form 
[35] with different parameters as in eqn [5], we get 
the permutability theorem for the KdV BTs: 


a r (c1 + o2)|v' — v| 
oe c 


Its proof is immediate from the point of view of the 
spectrum. 
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Bäcklund and Symmetries 


A symmetry of the nonlinear equation [15] is given 
by a flow commuting with it, that is, by an 
equation 


ee pye a) [37] 


where e is the group parameter, u = u(x, t; c), and the 
e derivative of [15] is zero on its set of solutions. 
A group transformation is obtained by integrating it. 
Usually this is possible only when eqn [37] is a 
quasilinear partial differential equation of the first 
order. Taking into account the evolution of the 
spectrum of the KdV equation [15], it is easy to 
prove that its symmetries are given by 


+00 +00 
4x AlL —3 >. att us 
n=0 n=0 
+00 
+ 3 P nn, + 2u] [38] 
n=0 


where o, and 8, are a set of constant parameters. 
For each choice of the parameters a, and n, 
one gets a symmetry of the KdV equation [15]. 
With eqn [38] one can associate the following 
evolution of the reflection coefficient R(k, t; €): 


dR Too - 
T =a Bo -4k^)" 


n=0 


ELT (—4k?) uL [39] 


and of the spectral parameter k 


Too 
ke = 2 PCS [40] 


As —(1/2)£1=xu,+2u, one can add to the 
symmetries [38] the exceptional one (which has no 
spectral counterpart as wu is not bounded 
asymptotically): 


Ue = 1 + 6tux [41] 


By a proper natural choice of the constant para- 
meters o, and Bn, one can define two infinite series 
of symmetries. The first one is obtained by choosing 
B,-0 and Oy = bar with m=1, 2,...,00 and can 
be denoted as the isospectral series as k,, — 0. This is 
formed by commuting symmetries. The second one 
is given by o, — 0 and 8, — 6, m with m — 1, 2,...,00 
and can be denoted as the nonisospectral series as 
k. #0. The nonisospectral symmetries have a 
nonzero commutation relation among themselves 
and with the isospectral ones. 
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Except for a few Lie point symmetries (given by 
eqn [41] and by choosing inside the series [38] those 
with different from zero only Bo or ao or o1) they 
are all generalized symmetries (Olver 1993). By 
analyzing their spectrum, it is easy to prove that the 
choice [38] is such that they are all independent. For 
the isospectral class, the evolution of the spectrum is 
simple and can be integrated to provide the group 
transformation of the spectrum 


R(k, t;€) = R(R, t) 


x exp E» sca). [42] 


n=O 


Let us now consider the simplest BT obtained by 
choosing, in eqn [33c], F(A) =o and G(A) = 1, where 
o is an arbitrary parameter. In the spectral space, this 
corresponds to the following change of the spectrum: 

< g — 2ik 

R(k,t) = PE FT AU t) [43] 
Defining Ř(k,t)=R(k,t;e) eqn [42] is equal to 
eqn [43] iff 


2 


alll — eg? (2n +1)’ 


n=0,1,...,00 [44] 
So we need an infinite number of symmetries to 
be able to reconstruct the change of the spectrum 
given by the BT. This shows that the existence of a BT 
is strictly connected to the existence of an infinity of 
symmetries which is a condition for the exact 
integrability of the nonlinear partial differential 
equation (Fokas 1980, Ibragimov and Shabat 1980). 


Discretization via Backlund 


BTs, apart from providing classes of exact solutions 
to nonlinear equations, play a very important role in 
the discretization of partial differential equations. As 
noted earlier, an auto-BT is a differential relation 
between two different solutions of the same non- 
linear partial differential equation. If it is assumed 
that the new solution Z is just the old solution u 
computed in a different point of a lattice, then the 
BT becomes just a differential-difference equation 
(Chiu and Ladik 1977, Levi and Benguria 1980). 
This can be carried out also at the level of the 
associated compatibility condition and in such a 
way one is able to also obtain its Lax pair. This 
demonstrates the integrability of the differential- 
difference equation 


v(n+ 1,t), - v(n,t), + [e(n + 1, t) — v(n, t)] 
x (e - lv(n + 1,t) — v(nt))) =0 [45] 


which is an integrable  differential-difference 
approximation to the KdV equation or 


w(n + 1,t), = w(n,t), 
w(n + 1,t) + w(n,t) 


: | (46 


a discrete integrable differential-difference approxima- 
tion to the sG equation (Hirota 1977, Orfanidis 1978). 

As the nonlinear superposition formulas are 
purely algebraic relations involving potentials asso- 
ciated with integrable nonlinear partial differential 
equations, one can interpret them as difference- 
difference equations. In the case of the sG equation 
from eqn [7], we have 


-- 2a sin 


U/n--1,m--1 一 Wim 


tilt LU — tU 
= 4arctan' ! (t bai EUER UTEM 2 [47] 


ay, — a2 4 


where W(x, t) = ys ns W(x, t)=Wrsioms Ww (x, t)= 
Wn, m+1, and W(x, t)=Wy+1,m41- In a similar manner, 
from [36], one gets 


ae callis (oy T CJ Watta = Fam [48] 
03 —253-t 2 LU P- Un. m41] 


The continuous limit of eqn [47], obtained by setting 
x— en and y= em and choosing 


a1 €1€2 
a? 4 


gives back eqn [2] (Rogers and Schief 1997). It is 
worth mentioning that one can also use known 
nonlinear lattice equations to construct BT for 
nonlinear partial differential equations (Levi 1981). 


See also: Integrable Systems and Discrete Geometry; 
Integrable Systems: Overview; Painlevé Equations; 
Solitons and Kac—Moody Lie Algebras; Toda Lattices. 
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Introduction 


The Batalin-Vilkovisky formalism for quantizing 
gauge theories has a long history of development. It 
begins with the Faddeev-Popov procedure for 
quantizing Yang-Mills theory, involving the Faddeev- 
Popov ghost fields (Faddeev and Popov 1967). It 
continued with the discovery of BRST symmetry by 
Becchi et al. (1976). Then Zinn-Justin (1975) 
introduced sources for these transformations, and 
a symmetric structure in the space of fields and 
sources in his study of renormalizability of these 
theories. Finally, Batalin and Vilkovisky (1981) 
systematized and generalized these developments. 
A more detailed account of this history can be 
found in Gomis et al. (1994), where many worked 
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examples of the Batalin-Vilkovisky formalism are 
given. At the present time, it is the most general 
treatment available. Alexandrov, Kontsevich, Schwarz, 
and Zabaronsky (AKSZ 1997) have presented a 
geometric interpretation for the case in which the 
action is topologically invariant. 


Structure of the Set of Gauge 
Transformations 


Consider a system whose dynamics is governed by 
a classical action S[ó/] which depends on the 
fields $ó'(x),i—1,...,". We employ a compact 
notation in which the multi-index i may denote 
the various fields involved, the discrete indices on 
which they depend, and the dependence on the 
spacetime variables as well. The generalized 
summation convention then means that a 
repeated index may denote not only a sum over 
discrete variables, but also integration over 
the spacetime variables. c;—«(ó') denotes the 
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Grassmann parity of the fields. Fields with €; — 0 
are called bosonic, with e;—1 fermionic. The 
graded commutation rule is 


p(x) (y) = (71) 9 (y)! (x) H 


For a gauge theory the action is invariant under a set 
of gauge transformations with infinitesimal form 

69 = Rie®, a=1or2or...m [2] 
The sc are the infinitesimal gauge parameters and 
R' the generators of the gauge transformations. 
When ea — e(c*) — 0 we have an ordinary symmetry, 
when €,=1 the equation is characteristic of a 
supersymmetry. The Grassmann parity of R’ is 
e(R‘,) — ej + € (mod 2). 

A subscript after a comma denotes the right 
derivative with respect to the corresponding field, 
that is, the field is to be commutated to the far right 
and then dropped. The field equations may then be 
written as 


$9; — 0 [3] 


where So is the classical action. Let X denote the 
surface in the space of solutions where the field 
equations are satisfied: 


Soils, = 0 [4] 


If the gauge transformations are “independent” 
on-shell, that is, 


rank R! |,, =m [5] 


the gauge theory is said to be “irreducible.” We 
assume here that this is the case. When it is not, the 
theory is *reducible." For details of the treatment in 
that case, see Gomis, Paris, and Samuel. The 
classical solutions are do € X. 

The Noether identities are 


So; Rj, = 0 [6] 

The general solution to the Noether identity is 
X = RE T + Sg; E" [7] 
The commutator of two gauge transformations is 


b1, 6710: = (Ri jR} - (-1)°R Ri, Jefe? [8] 


Fo aie e 


Since this commutator is a symmetry of the action, it 
satisfies the Noether identity 


Sui, (Ri Ri in (1)* 9 RjRÍ,) =0 i9] 
which by eqn [7] implies that 
R Rb - (-1)*" RR) = RYT), + SojEop [10] 


aj By 


Equations [8] and [10] lead to the following 
condition: 


(61, 52]! = (RET? 


art So; El) ere? — [11] 
The tensors T? are called the structure constants of the 
gauge algebra, although they depend, in general, on 
the fields of the theory. When E?,=0, the gauge 
algebra is said to be “closed,” otherwise it is “open.” 
Equation [11] defines a Lie algebra if the algebra is 
closed and the T, are independent of the fields. 

The gauge tensors have the following graded 
symmetry properties: 


Tap eT" Ts 12] 
Ezg = = — = TE 
The Grassmann parities are 
e(T ig) = Ea + €g + €4 (mod 2) [13] 


and 
e(E?,) = & sl €j T Eq + EB (mod 2) [14] 


Various restrictions are imposed by the Jacobi 
identity 


* [61， [62 63]] —0 [15] 


cyclic(123) 
These restrictions are 


(Ri4: 


pi Ba _ 
ay SujB oa) Eee —0 [16] 
cyclic(123) 


where 


3Aog = (Tey RÀ -Tol ) + (71) 6e? 


apy an ^ By 


x (T5, R5 — Th, Ta) 


| Bn" ya 


(1) 6*9 (75, Rb 4 Ty, TAS) 


yn af 


and 


3p" 


a^ 


三 (E 


k ji 0 €i€a 
apk Ra = E 615, i (1) 


i ki 6 (G+€o) pi ki 
XR), ES, + (71) 9*9) Re ES ) 
d (—1)9 rte) (a —D. B EE 7) J (—1y9hen) 
x (a — Y — B) 
As in the familiar Faddeev-Popov procedure, it is 


useful to introduce ghost fields C^ with opposite 
Grassmann parities to the gauge parameters e^: 


e(C^) = €g + 1 (mod2) [17] 


and to replace the gauge parameters by ghost fields. 
One must then modify the graded symmetry proper- 
ties of the gauge structure tensors according to 


Tasas. 下 人 Eee， [19] 
The Noether identities then take the form 
SaR C — 0 [19] 
and the structure relations [10] become 


(2R ,RI 


aJ B 


-RÍT),4SoEl,C?C^ —0 [20] 


a 


Introducing the Antifields 

We incorporate the ghost fields into the field set 
$^ = {C°}, where i—1,...," and a=1,...,m. 
Clearly A—1,..., N, where N=n+ m. One then 
further increases the set by introducing an antifield 
®*, for each field 64. The Grassmann parity of the 
antifields is 


c($) = «($^) +1 (mod2) [21] 
Each field is assigned a ghost number, with 
gh[$] = 0 
ghlC j=] [22] 
gh|®%,] = —gh[e^] — 1 


In the space of fields and antifields, the antibracket 
is defined by 


3X AY aX ay 


(X Y) = 9409, 0d D^ 


[23] 


where O, denotes the right, 9| the left derivative. The 
antibracket is graded antisymmetric: 


(X,Y) e —(-1)* —*"(Y. 的 
It satisfies a graded Jacobi identity 
(EX, Y), ZH 
x (Y, Z), X) + (-1) 7*9"? (Z, X), Y) =0 [25] 
It is a graded derivation 


(X, YZ) = (X, Y)Z + (C1)** (X, Z)Y 


xb [26] 
(XY, Z) = X(Y, Z) + (-1)** Y(X, Z) 
It has ghost number 
ghl(X, Y)] = gh[X] + gh[Y] + 1 [27] 


and Grassmann parity 


e((X,Y))=e(X)+e(Y)+1 (mod2) [28] 
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For bosonic fields 


(B, B) = 2 a [29] 
for fermionic fields 
(F,F) =0 [30] 
and for any X 
((X, X), X) = 0 [31] 


If one groups the fields and the antifields together 
into the set 
w= a=1,+..,2N [32] 


then the antibracket is seen to define a symplectic 
structure on the space of fields and antifields 


with 
j 0 64 
a = [= Sh ) [34] 


The antifields can be thought of as conjugate 
variables to the fields, since 


(31,85) = 65 [35] 


The Classical Master Equation 


Let S[$^,4^] be a functional of the fields and 
antifields with the dimension of an action, vanishing 
ghost number and even Grassmann parity. The 
equation 

OS OS — 


is the classical master equation. Solutions of the 
classical master equation with suitable boundary 
conditions turn out to be generating functionals for 
the gauge structure of the theory. S is also the 
starting point for the quantization. One denotes by 
£ the subspace of stationary points of the action in 
the space of fields and antifields: 


Os 
$ = = 一 一 0 37 
t=?) 37 
Given a classical solution do of Sọ one stationary 
point 1s 


(—4, C-0, (41-0 [38] 


250 Batalin-Vilkovisky Quantization 


An action which satisfies the classical master 
equation has its own set of invariances: 


0S ... 
az ^ zz [39] 
with 
a ac CUN 
bo Ogcüsh e 
This equation implies 
R2Rj|..— 0 [41] 


One says that Rj is invariant on-shell. A nilpotent 
2N x 2N matrix has rank <N. Let r be the rank of 
the hessian of S at the stationary point: 


OO,S 
02402 |... 


We then have r € N. The relevant solutions of the 
classical master equation are those for which r — N. 
In this case the number of independent gauge 
invariances of the type in eqn [39] equals the number 
of antifields. When at a later stage the gauge is fixed, 
the nonphysical antifields are eliminated. 

To ensure the correct classical limit, the proper 
solution must contain the classical action Sg in the 
sense that 


r — rank 


[42] 


S |^, | 


ə =0 7 Sold’ [43] 


The action S[9^, $*,] can be expanded in a series in 
the antifields, while maintaining vanishing ghost 
number and even Grassmann parity: 


S[o, "| = So + PR E. 十 C34 Ta (-1)"C'CP 
+ 416; (-1)*f Eh g(—1) CPC +... [44] 


When this is inserted into the classical master 
equation, one finds that this equation implies the 
gauge structure of the classical theory. 


Gauge Fixing and Quantization 


Equation [39] shows that the action S still possesses 
gauge invariances, and hence is not yet suitable for 
quantization via the path integral approach: a 
gauge-fixing procedure is necessary. In the Batalin- 
Vilkovisky approach the gauge is fixed, and the 
antifields eliminated, by use of a gauge-fixing 
fermion V which has Grassmann parity c(V)—1 
and gh[V]— —1. It is a functional of the fields ^ 
only; its relation to the antifields is 


Ow 


Pa = 3$ 


[45] 


We define a surface in functional space 


OV 
= A * o 
Xy = f ,V4)|v; = nl [46] 
so that for any functional X[®, ®*] 


0 -x| A 47 


To construct a gauge-fixing fermion V of ghost 
number —1, one must again introduce additional 
fields. The simplest choice utilizes a trivial pair 
Cas ña with 


(Ge) = éa + 1, 
ghIC;] = —1, 


era) = éa 48; 
gh[z,] = 0 
The fields C, are the Faddeev-Popov antighosts. 
Along with these fields we include the corresponding 
antifields C*^,z*^. Adding the term 元 Co to the 
action $ does not spoil its properties as a proper 
solution to the classical master equation, and one 
gets the nonminimal action 


g^ = $+ m, C^ [49] 
The simplest possibility for WV is 
V = C,x"*(¢) [50] 


where x^ are the gauge-fixing conditions for the 
fields ». The gauge-fixed action is denoted by 


Sy = Sls. [51] 


Quantization is performed using the path integral 
to calculate a correlation function X, with the 
constraint [45] implemented by a ó-function: 


Ty ( x)= f DODo" s(a; - st) 
_ (; wis, 2) XS] [52] 


Here W is the quantum action, which reduces to S in 
the limit 6-0. An admissible V leads to well- 
defined propagators when the path integral is 
expressed as a perturbation series expansion. 

The results of a calculation should be independent 
of the gauge fixing. Consider the integrand in eqn 
[52], 


I[®. *] = exp (z W[o, »])xie. $' [53] 
Under an infinitesimal change in V 


Igssy(X) = Ig ( X) = / D&AISU [54] 


where the Laplacian A is 


EA 十 1 ð 9 
A= CD)” 568 ba 


[55] 


Obviously, the integral Iy(X) is independent of W if 
AI —0. For X —1 one gets the requirement 


A exp (z w) =p (z w) 
x 


1 1 
The formula 
(W, W) —ibAW [57] 


is the quantum master equation. A gauge-invariant 
correlation function satisfies 


(X, W) = ibAX [58] 


The terms of higher order in b by which the 
quantum action W may differ from the solution of 
the classical master equation $ correspond to the 
counter-terms of the renormalizable gauge theory if 


AS =0 [59] 


One must, of course, use a regularization scheme 
which respects the symmetries of the theory. For 
W —S-- O(b) the quantum master equation [57] 
reduces in this case to the classical master equation 


($,$) =0 (60] 


Hence, up to possible counter-terms, one may 
simply choose W — S. 

To implement the gauge fixing, one uses for the 
action W — S??", For the path integral Z = Ig(X = 1), 
the integration over the antifields in eqn [52] is 
performed by using the 6-function. The result is 


z = [ Doexp (ss) 61) 


Geometrical Interpretation of Topological 
Field Theories 


The Batalin-Vilkovisky formalism for topological 
field theories has been given a geometrical inter- 
pretation by AKSZ (1997). 

A supermanifold equipped with an odd vector 
field satisfying Q^ =0 is called a O-manifold. A 
O-manifold provided with an odd symplectic struc- 
ture w (P-structure) is called a QP-manifold if the 
odd symplectic structure is QO-invariant, that is, 
Low-0. Every solution to the classical master 
equation determines a QP-structure on M and vice 
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versa. The geometric object corresponding to a 
classical mechanical system in the Batalin-Vilkovisky 
formalism is a QP-manifold. 

The nondegenerate closed 2-form w is written as 


w = dx dz" [62] 


where 2^ are local coordinates in the supermanifold 
M. For functions on M, an (odd) Poisson bracket is 
defined as in eqn [33], where w? stands for the 
inverse matrix of w,,. An even function $ on M 
satisfies the classical master equation if (S, S) — 0. 
The correspondence between vector fields and 
functions on M is given by KrG — (G, F), where Kp 
is the vector field, F the given function, and G an 
arbitrary function. The function F is called the 
Hamiltonian of the vector field Kr. 

Geometrically, equivalent QP-manifolds describe 
the same physics. In particular, one can consider 
an even Hamiltonian vector field Kp corresponding 
to an odd function F. This vector field determines 
an infinitesimal transformation preserving P-structure. 
It transforms a solution § to the classical master 
equation into the physically equivalent solution 
S--«(S,F), where e is an infinitesimally small 
parameter. 

A submanifold L of a P-manifold M is called a 
Lagrangian submanifold if the restriction of the 
form w to L vanishes. In the particular case when 
M — IIT*N (the cotangent bundle to N with reversed 
parity of fibres) with standard P-structure, one can 
construct many examples of Lagrangian submani- 
folds in the following way. Fix an odd function Y on 
N, the gauge fermion. The submanifold Ly € M 
determined by the equation 


f= (63 


where {x*,&,} are coordinates corresponding to the 
identification of M, will be a Lagrangian submani- 
fold of M. 

The P-manifold M in the neighborhood of L can 
be identified with IIT*L. In other words, one can 
find such a neighborhood U of L in M and a 
neighborhood V of L in IIT*L that there exists an 
isomorphism of P-manifolds U and V leaving L 
intact. Using this isomorphism a function V defined 
on a Lagrangian submanifold L C M determines 
another Lagrangian submanifold Ly C M. 

Consider a solution S to the classical master 
equation on M. In the Batalin-Vilkovisky formalism 
we have to restrict $ to a Lagrangian submanifold 
L € M, then the quantization of $ can be performed 
by integration of exp(iS/b) over L. One may 
construct an odd vector field O on L in such a 
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way that the functional $ restricted to L is 
O-invariant. This invariance is BRST invariance. 

AKSZ apply these geometric constructions to obtain 
in a natural way the action functionals of two- 
dimensional sigma-models (Witten 1998) and to 
show that the Chern-Simons theory (Axelrod and 
Singer 1991) in Batalin-Vilkovisky formalism arises as 
a sigma-model with target space II, where G stands 
for a Lie algebra and II denotes parity inversion. 


The Poisson-Sigma Model 


The quantization of the Poisson-sigma model was 
performed by Hirshfeld and Schwarzweller (2000) 
and by Cattaneo and Felder (2001). The Poisson- 
sigma model is the simplest topological field theory 
in two dimensions. It is a field theory on a two- 
dimensional world sheet without boundary (Schaller 
and Strobl 1994). It involves a set of bosonic scalar 
fields, which can be seen as a set of maps 
X'::M—N, where N is a Poisson manifold. In 
addition, one has a 1-form A on the world sheet M 
which takes values in T*(N), for x coordinates on M 
we have A = A,jdx! ^ dX'. Its action is 


So[X, A] = f p(t” (A,;9,X' + P'(X)A,A,) [64] 
M 


where ¢” is the antisymmetric tensor and p is the 
volume form on M. The gauge transformations of 
the model are 

6X = P'(X), ^ 6A, = De [65] 


ui^ 
where D’; = 0,6; + P* ;A „k. The equations of motion 
are 


e" Di Ay = 0 (66] 


ju 


and 


e" (0,X! + P!A,) = e"D,X' = 0 [67] 


The gauge algebra is given by 
[6(e1), 6(€2)]X’ = P'(P"" je1n€2m) 
[6(£1),6(£2)]A,; = D i jElnE2m) [68] 
- (€P D,X!)e,,, P"" 


Ji& 1n€2m 


In our general notation the generators of the gauge 
transformations R are here P” and D/.. The gauge 
tensors T and E are P" , and enp P"" ji. Ele higher- 
order gauge tensors A and B vanish. 

The ghost fields are again denoted by C'. The 
Noether identities are then 

J u( e" D} Anj P“ + (&""D, X*)DL;) Cy —0 [69] 

M 


pu 


Considering the commutator of two gauge transfor- 
mations leads to (see eqns [8]-[11]) 


f p(2P™ ; p! E PP Gn si 
M 
f n(2te*ibj, + Pm yA, P) 70] 

M 

-De + (&" D,X)e,, P" ) is, esf) 
The Jacobi identity is 
P! ,,P™* C;C;C, = 0 [71] 

The fields and antifields of the model are 

A= {A XC} and B= {a Xt CO" [72] 
The extended action is 
Sz £ m ("nx 十 P"(X)A,;A,;) 

HAD C, + Xt OOC + ; CP, (X)C;C, 


ni 


+ ZAM Ac, PH (C, ci) [73] 


The gauge-fixing conditions are taken to be of the 
form x;(A, X), so that the gauge fermion [50] becomes 
y = C'y;(A, X). The antifields are then fixed to be 


ris Z Oxj(A, X) 
Abi E C; ð Aig 
m 7 Oxj(A, X) 
Cc; =0 
The gauge-fixed action is 
Sw =| p(t” (Að Xi EE P"(X)A,;A,;) 
Ox4(A, X) Ox, (A, X) pi 
k j k Y. 
FOU a aer PS 
La ONm(A, X) an OXn(ALX) — su 
at os a (ip Spero gee Ee al yP a CX 
id E OA yi d 9A,j Lai on 
x CRCI 十 元 Xi(A， x) [75] 


Now consider different gauge conditions: 


1. First, the Landau gauge for the gauge potential 
Xi = "Aui, SO that the gauge fermion becomes 
V = C'O"A,;. The antifields are fixed to be 


Atle — ð" C! 
AS =C 0 [76] 
G =A 
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for this gauge choice the gauge-fixed action is 


Sg = | n (aax + P (X)A pA) + CO D' CG; 
M 


jii 


1 ey 
+7 (0" Ci O je P (X) 


x GG - (Aj) ) 77 


Translating this action into the notation of Cattaneo 
and Felder, one sees that it is exactly the expression 
they use to derive the perturbation series. 

2. Now consider the temporal gauge x; — Ao;. The 
gauge fermion is given by V = C/A;. The anti- 
fields are fixed to 


A*0i = Ci 
ATU =) 
| [78] 
AL = GC" =O 
C! = Apj 
The gauge-fixed action is 
Sy = 人 ue" (A, dX + PX) A, As) 
JM 
+ CD; — 2 (Av)) [79] 


3. Finally consider the Schwinger-Fock gauge 
x; 2 X" A,j. Then the antifields are fixed to be 


Ate UA x" C! 
ac’ =H [80] 
G = x" A, 


for this gauge choice the gauge-fixed action is 
Sy = f ue" (A,08,X* + P(X)A,A,) 
M 


jul 


+ Cix” D} Ci — #(0"A,)) i81] 
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Introduction 


The Bethe ansatz is a particular form of wave function 
introduced in the diagonalization of the Heisenberg 
spin chain. It underpins the majority of exactly solved 
models in statistical mechanics and quantum field 


Notice that in the noncovariant gauges 2 and 3 the 
action simplifies, in that the term which arose 
because of the nonclosed nature of the gauge algebra 
vanishes. 


See also: BF Theories; BRST Quantization; Constrained 
Systems; Graded Poisson Algebras; Operads; 
Perturbative Renormalization Theory and BRST; 
Supermanifolds; Topological Sigma Models. 
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theory. At the heart of the Bethe ansatz is the way in 
which multibody interactions factor into two-body 
interactions. The Bethe ansatz is thus intimately 
entwined with the theory of integrability. 

The way in which the Bethe ansatz works is best 
understood by working through an explicit hands-on 
example. The canonical example is the isotropic 
antiferromagnetic Heisenberg Hamiltonian 


Hz » bu 十 hti, hij = 1 (c; ‘Oj + 1) [1] 
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where o = (o *,0?,0*) are Pauli matrices and L is the 
length of the chain. Periodic boundary conditions are 
imposed. However, open boundary conditions may 
also be treated, along with the addition of magnetic 
bulk and boundary fields. The z-components of each 
of the spins are either up or down. Since the 
z-component of the total spin commutes with the 
Hamiltonian, the total number n of up spins serves as a 
good quantum number. A state of the system can 
therefore be conveniently described in terms of the 
coordinates of all the up spins. Denote these coordi- 
nates by xj, with 1 € x; € L. The quantum number n 
ensures that the Hamiltonian decomposes into L + 1 
sectors, each of size L choose n. The antiferromagnetic 
ground state occurs in the largest sector. 

The normalization of the Hamiltonian [1] is such 
that its action is that of the permutation operator: 


B=} = =) 


Ape). = FEES 


h|++) = |++) 
[2] 
b|-4) = 片 一 ) 


Diagonalization of Sectors 


One can address the diagonalization of the sectors 
for various cases. 


Case 1: n=0 


Consider the case with all spins down. The 
eigenstate is Y=|—---—), with HV—LV and, 
thus, E — L is the trivial solution. 


Case 2: n— 1 


There are L states, with 


L 
V — > a(x)|v(x)) [3] 


xl 


where |v(x)) is the state with an up spin at site x. 
The aim is to find the amplitudes a(x). It is clear 
that 


H|v(x)) = (L — 2)|v(x)) + (x — 1)) 

+ [vx + 1)) [4] 
in the bulk (away from either boundary). Insertion 
of [3] into HW = EW gives 

Ea(x) = (L—2)a(x)+a(x—1)+a(x+1) [5] 


Substitution of spin waves a(x) — e'** gives 


E=L—2+2cosk [6] 


The boundary conditions are such that a(0)=a(L) 
and a(L + 1) —a(1); either gives e} = 1, from which 
the L values of k follow. 


Case 3: n=2 


Here the wave function can be written in terms of 
the two flipped spins as 


V = a(x,y) w(x,y)) (7 
x«y 
It is to be emphasized that one is working in the 
region with x « y. There are two cases to consider: 
(1) y »x4-t and (2) y=x+1. Consider the 
interactions in the bulk. For (1) the action of the 
Hamiltonian implies 
Ea(x, y) — (L — 4)a(x, y) + a(x — 1, y) + a(x + 1,y) 
t a(x, y — 1) t a(x, y +1) [8] 
and for (2) 
Ea(x,x + 1) =(L — 2)a(x,x + 1) 
tak- Lat I Fart [9 
The compatibility of these two equations requires that 
2a(x,x + 1) = a(x,x) 3 a(x 4 1,x +1) [10] 


which is known as the “collision” or “meeting” 
condition. 

Some adjustments need to be made for spins 
which get flipped at the boundaries. Looking at 
[8] and [9] with x —1 and x= L, it is evident that 
one can take 


a(y,x + L) = a(x, y) [11] 


to restore the original ordering. The terms which 
arise involve up spins at sites 0 and L+1. This 
illustrates the periodic boundary condition. 

We now assume (the Bethe ansatz) that 


a(x,y) = Aye e + Az eleixeikiy [12] 
Substitution of the ansatz [12] into [8] gives 
E = L —4--2cos hk; + 2cos k2 [13] 
Substitution of [12] into [10] gives 


Ag — d-2eh + eli) 

Aq 1—2eik2 + eilki tha) 
The three relations [11], [12], and [14] give the 
Bethe equations 


[14] 


A 
'=— and eM! = = [15] 


which are to be solved for kı and k2. Note that 
ellkit+k2)L = LS 


Case 4: n—3 


The full power of the Bethe ansatz method becomes 
evident for three particles. Here 


v= X a(x.y,z)li(x y. 2) [16] 


x<y<z 
There are several cases to consider: 
1. y 25 x * 1 and z > y+ 1, where 
Ea(x, y,z) =(L — 6)a(x,y,z) 4- a(x t 1,y,z) 
+a(x,y+1,z)+a(x,y,z+1) [17] 


By a(x+1,y,z), we 
a(x — 1, y, z), etc. 
2. y=x + 1 and z > y+ 1, with 


mean a(x+1,y,z)+ 


Ea(x,x + 1,2) 
= (L — 4)a(x,x + 1,z)+ a(x — 1,x + 1,2) 
t a(x,x 4- 2,2) 4 a(x,x - 1,z € 1) [18] 


3. y » x - 1 and z 2 y 4 1, where 


Ea(x,y,y 4- 1) 
= (L—4)a(x,y,y+1)+a(x+1,y,y+ 1) 
+a(x,y—1,y+1)+a(x,y,y+2) [19] 


4. y=x + 1 and z=y + 1, for which 


Ea(x,x + 1,x - 2) Z(L — 2)a(x — 1,x 4- 1, x +2) 
+a(x,x+1,x+ 3) [20] 


Again, we must ensure that these equations are 
compatible. This involves comparison of the last 
three equations with [17]. The three equations to be 
satisfied are 


2a(x,x--1,2) =a(x,x,z)+a(x+1,x+1,z) [21] 
2a(x,y,y + 1) 2 a(x,y. y) +a(x,y+1,y+1) [22] 


4a(x,x + 1,x 4-2) 2a(x,x,x 4-2) 3- a(x,x - 1,x 4- 1) 
+a(x,x+2,x+2) 
talt 1,e + 1,44 2) [23] 


But note that setting z ^x +2 in [21] and y=x +1 
in [22] leads to [23] being automatically satisfied. 
We are thus left with only two equations [21] and 
[22]. Note the similarity between these two equa- 
tions and the meeting condition [10] for the 2 —2 
case. 
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In this case the Bethe ansatz is 
a(x, y, z) = Aiz3212525 + Ai3221 2323 
+ A2132324%3 十 Az25232i 
+ A32123232] + A312232423 [24 


in which w=e™. This is a sum over the 3! 
permutations of the integers 1, 2, 3. Inserting this 
ansatz into [17] gives 


, E=L—6+2(cosk; + cos k2 + cos k3) [25] 


To determine the k;, it is convenient to define 


ik; 


Sij = ] 一 225 T Zizj [26] 


Substitution of [24] into the meeting conditions [21] 
and [22] then gives 


$12A 123 + $21A4213 + $13À132 + 5314312 

+ $23A231 + $32A321 = 0 [27] 
$23A123 + $32A132 + $13A213 + $31A231 

+ S214321 + $12A312 = 0 [28] 


These equations are assumed to be satisfied in 
permutation pairs, that is, 


$12À123 十 S214213 = 0 


[29] 
523A123 + $32A132 = 0, etc. 


Up to an overall constant, the relations [27] and [28] 
are satisfied by 


A123 = $21$31$32, A132 =—S31S21$23 
A312 = 813523521, A321 = —523513512 [30] 


A231 = $32512513. A213 = —5125$32531 


The boundary condition, a(y, z, x + L) — a(x, y, z), 
gives 


(zi A321 — A132)212222 T (z5 A312 — A231)222321 
+ (zi A231 € A123)212223 "P (z$ A23 = A321) 23232} 
+ (zh A132 — Azis)z22]123 + (25A123 — Aziz) 232} 23 
= [31] 
This leads to the equations 


L Amos Aix $253 
Zi 一 — Z — 一 - 
A231 A321 / 512513 


A213 Az31 $12532 [32] 


L 
Ai32 A312 521523 


p _ A321 _ A312 Si9523 
2 dg — AM 一 一 一 一 
A213 A123  $31$32 


which can be solved for the Bethe roots kj. 
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General n 


The general Bethe ansatz is 


dlie t= Ap suf ”9 
P 


where the sum is over all n! permutations 
P= pi,..., Pn} of the integers 1,...,7. The boundary 
condition is 


d(X2,43,...,294,231 + DL) 9a(x41,x2,...,x4) — [34] 
leading to the Bethe equations 
A 
L pis Dn 
Zp =o [35] 
P Ar ub 
for all permutations, with 
Å pispa ES EP lI Spi.p; [36] 


l<i<j<n 


where ep is the signature of the permutation. Finally, 


1TTS 
E - n— pipi 
<p, = T. 

4—2 Spip 


L \n—1 : Sej 

e A “44 [37 
or 不 二 (一 ) aes [37] 
for 1 — 1,...,7. The eigenvalues are given by 


E- L- Y (2cosk; — 2) [38] 


j=l 


Another form of the Bethe equations is obtained 
by defining 


w, _ "j- (1/2)i 
* -— ad (1/2)i 39] 
which gives 
= 1 
E=L-)Y 一 一 一 一 4 
2. u? + 1/4 ps 


with wu; satisfying 


(s 3 em) qh- mui 41] 
uj + (1/2) 1 ae u; — 4g +i 
for 151, Ht: 

All eigenvalues of the Heisenberg spin chain may 
be obtained in terms of the Bethe ansatz solution. 
For example, the distribution of roots u; for the 
ground state are real and symmetric about the 
origin. Excitations may involve complex roots. 
Although obtained exactly in terms of the Bethe 
roots, the  Bethe  ansatz wave function is 
cumbersome. 

We have thus seen how the Bethe ansatz works 
for the Heisenberg spin chain. The underlying 
mechanism is the way in which the collision or 


meeting conditions can be handled in terms of two- 
body interactions. To see this more clearly, the six 
permutation pair equations [29] can be written in 
the general form A45, — Yip Apac and Aabe = Yi, A, 
where Y,, = —Spa/Sah- Now there are two possible 
paths to get from A,» to Acha, namely 


A dba = Y Y ac Fil de 


[42] 
Acha = Tp. Yac Yab/A abe 


Both paths must be equivalent, with 
Yab Yba = 1 and Yab Yac Ybe = Ype Yac Yab [43] 


The latter is a condition of nondiffraction or 
equivalently a manifestation of the Yang—Baxter 
equation. 

Historically, the next model to be exactly solved in 
terms of the Bethe ansatz was the one-dimensional 
model of N interacting bosons on a line of length L 
defined by the Hamiltonian 


N g? 
Hee parri 


where c is a measure of the interaction strength. For 
this model the Bethe ansatz wave function is of the 
same form as [33] with the two-body interaction 
term given by 


S. 6(xi—%) [44] 


1<i<j<N 


Sab = Ra — kp + ic [45] 
The Bethe equations are given by 
N ; 
| k; — ket ic 
exp(ik;L) 三 一 LI s ae 
for, f= dyisu gdh [46] 


The energy eigenvalue is 


N 
E=} # [47] 


j=1 


For repulsive (c > 0) interactions, one can prove that 
all Bethe roots are real. 

The Bethe ansatz has been applied to a number of 
other and more general models, both for discrete 
spins and in the continuum. These include the 
anisotropic Heisenberg (XXZ) spin chain, for 
which the above working readily generalizes to 
trigonometric functions. The underlying ansatz [33] 
remains the same. One key generalization is the 
nested Bethe ansatz, which arises, for example, in 
the solution of the general N-state permutator 
model, the Hubbard model, and the Gaudin-Yang 
model of interacting fermions. For such models the 
nested Bethe ansatz involves an additional level of 
work to determine the amplitudes appearing in the 


wave function [33] due to higher symmetries. This 
results in Bethe equations involving different types 
or colors of roots. 

The exactly solved one-dimensional quantum spin 
chains may also be obtained from their two-dimen- 
sional classical counterparts — the vertex models. For 
example, the six-vertex model shares the same Bethe 
ansatz wave function and Bethe equations as the 
XXZ spin chain. The more general permutator 
Hamiltonians are related to multistate vertex models. 
One may also consider other spin-$ models. 

The discussion in this article has centered on what is 
known as the coordinate Bethe ansatz. Another 
formulation is the algebraic Bethe ansatz, which was 
developed for the systematic treatment of the higher- 
spin models. In this formulation, operators create the 
Bethe states by acting on a vacuum. The algebraic 
Bethe ansatz goes hand-in-hand with the quantum 
inverse-scattering method. In all of the exactly solved 
Bethe ansatz models, it is possible to derive quantities 
like the ground-state energy per site via the root density 
method, which assumes that the Bethe roots form a 
uniform distribution in the infinite-size limit. The 
thermodynamics of the Bethe ansatz solvable models 
may also be calculated in a systematic fashion. 

Despite Bethe's early optimism, the Bethe ansatz 
has not been extended to  higher-dimensional 
systems. 


See also: Affine Quantum Groups; Eight Vertex and Hard 
Hexagon Models; Integrability and Quantum Field 
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Introduction 


BF theories are a class of gauge theories with a 
nontrivial metric-independent classical action. As 
such these theories are candidate topological field 
theories akin to the Chern-Simons theory in three 
dimensions, but in contrast to the Chern-Simons 
theory these exist and are well defined in arbitrary 
dimensions. 

The name “BF theories” derives from the fact 
that, roughly (see [1] below and the subsequent 
discussion for a more precise description), the action 
of the BF theory takes the form f B ^ F4 with Fa the 
curvature of a connection A and B a Lagrange 
multiplier. The classical equations of motion imply 
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that A is flat, Fa =0, and thus BF theories are 
topological gauge theories of flat connections. 

Abelian BF theories and their relation to topolo- 
gical invariants (the Ray-Singer torsion) were 
originally discussed by Schwarz (1978, 1979). In 
the context of the topological field theory, non- 
abelian BF theories were introduced in Horowitz 
(1989) and Blau and Thompson (1989, 1991). 

Since then, BF theories have attracted a lot of 
attention as simple toy-models of (topological) 
gauge theories, and also because of their relation- 
ships with the Chern-Simons theory, the Yang-Mills 
theory, and gauge-theory formulations of gravity, as 
well as because of the rather rich and intricate 
structure of their quantum theories. 

The purpose of this article is to provide an 
overview of these various features of BF theories. 
The standard reference for the basic classical and 
quantum properties of BF theories is Birmingham 
et al. (1991). 
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Basic Classical Properties of BF Theories 
Nonabelian BF Theories 


The classical action and equations of motion  Typi- 
Typically, the classical action of the BF theory takes 
the form 


Spr (A, B) 一 tro B ^ Fa [1] 
M 

where Fa is the curvature of a connection A on a 
principal G-bundle P — M over an n-dimensional 
manifold M, B is an ad-equivariant horizontal 
(n 一 2)-form on P, and trc; (a trace) denotes an 
ad-invariant nondegenerate scalar product on the 
Lie algebra q of the Lie group G. Generalizations of 
this are possible, in particular, for G abelian or for 
n — 3 and are mentioned below. 

We consider FA and B as forms on M taking 
values in the bundle of Lie algebras adP — P X, 9 
and refer to such objects as elements of O*(M, q). 
Then tr B A F4 € Q"(M,R) is a volume form on M. 
In order to simplify the exposition, in the following 
we will mostly assume that G is compact semisimple 
and that M is compact without a boundary (even 
though relaxing any one of these conditions is 
possible and also of interest in its own right). 

Varying the action [1] with respect to A and B, 
one obtains the classical equations of motion 


Fi =90, daB — 0 [2] 
where 
daB = dB + [A, B] [3] 


is the covariant exterior derivative. In particular, 
therefore, the equations of motion imply that the 
connection A is flat. 


Gauge invariance For any z, the action [1] is 
invariant under G gauge transformations (vertical 
automorphisms of P) acting on A and B as 


A—g'Ag-g'dg B—g Bg [4 


(the latter is what is meant by the fact that B takes 
values in ad P), because F4 is also ad-equivariant, 
FA — g'Fag, and trg is ad-invariant. The infinitesi- 
mal version of this statement is that the action is 
invariant under the variations 


6A—daA, B= [B,\] [5] 


where A € O?(M, a) can (formally) be thought of as 
an element of the Lie algebra of the group of gauge 
transformations. 

Gauge-fixing this symmetry can proceed in the 
usual way (via the Faddeev-Popov or Becchi-Rouet- 


Stora-lyupkin procedure), a typical gauge choice 
being d4,*(A—Ap)=0 where Ao is a reference 
connection, and * is the Hodge duality operator 
corresponding to a choice of metric on M. 


Local p-form symmetries For » — 2, the only local 
symmetries of the BF action are the above G gauge 
transformations. For n > 2, however, there are other 
local symmetries associated with shifts of B, € 
OP(M,a) with p = n — 2 > 0. Indeed, integration by 
parts using Stokes’ theorem and M = 0 shows that [1] 
is invariant under 


A—A, By—Bp+darp-1, Ag 1€ OP! (M,a) [6] 


For p —1, A is a 0-form and the invariance follows. 
For p> 1, however, the gauge parameter has, in 
some sense, its own gauge invariance. Namely, 
under the shift 


Àp-1 => Àp-1 + daXp-2 [7] 
one has 
AnAp—1 — dA 3 + [FA, 5-2] [8] 


Thus for F4 —0, the shift [7] has no effect on the 
local symmetry [6]. Likewise, for p ^ 2 the parameter 
Mp_2 itself has a similar invariance, etc. Since Fa =0 
is one of the classical equations of motion, the shift 
symmetry [6] is what is called an *on-shell reducible 
symmetry." Gauge-fixing such symmetries is not 
straightforward, and one generally appeals to the 
Batalin- Vilkovisky formalism to accomplish this. 


Diffeomorphisms and local symmetries One mani- 
festation of the general covariance of the BF action 
[1] is the on-shell equivalence of (infinitesimal) 
diffeomorphisms and (infinitesimal) local symme- 
tries. Diffeomorphisms are generated by the Lie 
derivative Lx along a vector field X. The action of 
Lx on differential forms is given by the Cartan 
formula Lx — dix +ix d, where i|, is the operation 
of contraction. The action of the Lie derivative on 
A and B can be written in gauge covariant form as 


LxA = ixFA + dAA(X), 

LxB = ix daB + [B, A(X)] 十 dAA' (X) 
where A(X) —-ixA and A'(X) — ixB. This shows that 
on-shell diffeomorphisms are equivalent to field- 


dependent gauge and p-form symmetries of the 
BF action. 


9] 


The classical moduli space The classical moduli 
space C — C(P, M, G) is the space of solutions to the 
classical equations of motion modulo the local 
symmetries of the action. Since the field content 


and the nature of the local symmetries of the BF 
theory depend strongly on the dimension 7 of M, the 
structure and interpretation of the classical moduli 
space also depend on z. 

For n=2, by [5] the equation of motion [2] for 
BcQ?(M,q) says that A is invariant under the 
infinitesimal gauge transformation generated by B. 
Thus if A is “irreducible,” there are no nontrivial 
solutions for B and, away from reducible flat 
connections, the classical moduli space is just the 
moduli space of flat connections on P — M over the 
surface M: 


Gs Aga P.) [10] 


This space may or may not be empty, depending on 
whether P admits flat connections or not. 

For z—3, the equation of motion [2] for 
B € Q'(M,q) says that B is a tangent vector to the 
space of flat connections at the flat connection A, in 
the sense that under the variation 6A = B, one has 


óFA = daB =0 [11] 


The local G gauge symmetry and the 1-form symmetry 
[6] now imply that the moduli space of classical 
solutions can be identified with the (co-)tangent bundle 
of the moduli space of flat connections on P —^ M 
over the 3-manifold M: 


Co > T MaatlP, G) [12] 


In higher dimensions there appears to be less 
geometrical structure associated with BF theories, 
and all that can be said in general is that the tangent 
space to C, at a solution (A, B) of the equations of 
motion [2] is the vector space: 


TiaaCn = HA(M, 8) 6 H4 " (M, q) [13] 


where H* (M, q) are the cohomology groups of the 
deformation complex 


da : O*(M,q) — Q (M, 9) [14] 


associated with the flat connection A, F4 — (d4)^ — 0. 
When M is topologically of the form M= x R 
(where one can think of R as time), one has 


Tiape = Hi (£, a) 6 H3 ^(X., g) [15] 


This is naturally a symplectic vector space (necessary 
for a phase space), the nondegenerate antisymmetric 
pairing being given by Poincaré duality: 


«(laid [br] aa}, bal) = | trolar ^ ba — a2 Abr) '16 


Metric independence Perhaps the most important 
property of the action [1] is that, in contrast to, 
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for example, the usual Yang-Mills action for 
nonabelian gauge fields 


Sym wid tr; FA AxFa [17] 


48 Jm 
it does not require a metric (or the corresponding 
Hodge duality operator x) for its formulation. This 
makes it a candidate action for a “topological field 
theory," this term loosely referring to field theories 
which, in a suitable sense, do not depend on 
additional structures imposed on the underlying 
space(-time) manifold M, in this case a Riemannian 
structure. 

To establish that BF theories are “topological 
quantum field theories," one needs to show that 
the partition function (and correlation functions) 
of the quantized BF theory are also metric 
independent. This is not completely automatic as 
typically the metric enters in the gauge fixing of 
the local symmetries of the action which is 
required to make the quantum theory well defined. 
The usual lore is that since the metric only enters 
through the gauge fixing and since the quantum 
theory should be independent of the choice of 
gauge, it should also be metric independent. In the 
case of nonabelian BF theories, the complexity of 
their local symmetries complicates the analysis 
somewhat, but it can nevertheless be shown that 
BF theories indeed define topological field theories 
also at the quantum level. 


Special Features of Abelian BF Theories 


Al the features of nonabelian BF theories discussed 
above are, of course, also valid when G is abelian 
(with some obvious modifications and simplifica- 
tions). However, when G is abelian, a more general 
action than [1] is possible. Indeed, although there is 
no obvious higher p-form analog of nonabelian 
gauge fields, in the abelian case G —U(1) or G— R, 
and the condition FA € €O^(M, R) can be relaxed. In 
particular, one can consider the actions 


S(n,p) = S(Bp, Cn-p-1) = | Be Rd. si [15] 


with Bp € 2?(M,R),Cy-p-1 € Q"?"*(M,R), and 
Fc — dC, its (n — p)-form field strength. More gen- 
erally, one can also consider the hybrid action 


Samp) = f By ^ daG p (19 


where A is a fixed (nondynamical) flat G-connection, 
d; — 0, and B and C take values in the corresponding 
adjoint bundle. This action can be considered as the 
linearization of the nonabelian BF action [1] around 
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the flat connection A, and it reduces to the abelian BF 
action [18] for q — R. 

The action is invariant under the (reducible) local 
symmetries 


B, = By "E daXp-1 


20 
Cy—p-1 = C,-p-1 T Bary» | | 


The space of solutions to the equations of motion 
d4C — dAB — 0 modulo gauge symmetries is (cf. [13]) 
the finite-dimensional vector space 


C,» = H4(M,9)® Hy? (Mio) X DI 


which is naturally symplectic for M — X x R. 


Uses and Applications of Quantum 
Abelian BF Theories 


Quantization of Abelian BF Theories and the 
Ray-Singer Torsion 


We will now show that the partition function of 
the abelian BF theory (actually more generally that 
of the linearized nonabelian BF action [19]) is 
related to the Ray-Singer torsion of M. This 
requires some preparatory material on Gaussian 
path integrals, determinants, and gauge fixing that 
we present first. 

In order to simplify the exposition, we assume 
that there are no harmonic modes, either because 
they have been gauged away or because the 
cohomology groups of da are trivial, H£ (M, q) — 0, 
that is, the deformation complex [14] is *acyclic." 


Laplacians, determinants, and the Ray-Singer 
torsion Choosing a Riemannian metric g (and 
Hodge duality operator x) on M, the twisted 
Laplacian on p-forms is 


AT! = (da 十 di)" = dad) + d dA [22] 


where d = + *dA4* is the adjoint of d with respect to 
the scalar product on p-forms defined by x. This is an 
elliptic operator whose determinant can be defined, for 
example, by a ¢-function regularization. Denoting the 
(nonzero) eigenvalues of A? by AP | its C-function is 


CP)(s) = var)” [23] 


This converges for Re(s) sufficiently large and can be 
analytically continued to a meromorphic function of 
s analytic at s — 0, so that 


det AU) := eS") [24] 


is well defined. The Ray-Singer torsion of (M, q) 
(with respect to the flat connection A) is then 


defined by 


n 


TA(M) = Į Į (det A 


(—1)’p/2 
p=0 


i25] 


Even though this definition depends strongly on the 
metric g on M, the Ray-Singer torsion has the 
remarkable property of being independent of g. The 
Ray-Singer torsion can be shown to be trivial 
(essentially —1 modulo zero-mode contributions) 
in even dimensions, but is a nontrivial topological 
invariant in odd dimensions. Henceforth, we will 
suppress the dependence on M and denote the 
n-dimensional Ray-Singer torsion by Tal(n). 


Gaussian path integrals and determinants The path 
integral for abelian BF theories is modeled on the 
usual formula for a 6-function 


u 1 d" 
(V 2m)" Jr" 


from which one deduces the Gaussian integral 
formula 


6” (x) q e7* [26] 


1 | d"xd"x e 7 Dx-iKxrin] 
R"xR" 


(V/ 2m)" 


= | d'xé&"(Dx--]J)e** 
R" 


1 ~iK.D JJ 
———— e 27 

det D * d 
Here, we have assumed that the operator (matrix) D 
is invertible. The model that one uses in the path 
integral is that 


/ dig] de uD = (det D)** — Q8] 


where ó is a set of fields and the x are a set of dual 
fields with D again a nondegenerate operator. The 
inverse determinant arises for Grassmann even fields 
(as in [27]), while it is the determinant that appears 
for Grassmann odd fields. 


Gauge fixing — the Faddeev-Popov trick If the 
action [19], Sal, p) — | BydaC,-p1, were non- 
degenerate, its partition function could be defined 
directly by [28]. However, because of gauge invariance 
of the action, the kinetic term is degenerate and one 
needs to eliminate the gauge freedom to obtain an (at 
least formally) well-defined expression for the partition 
function. Concretely, this degeneracy can be seen by 


E 


"€ 73 K- — 


aT - 
=r 3 


recalling that, when there are no harmonic forms (as we 
have assumed), there is a unique orthogonal Hodge 
decomposition of a p-form B, € QP (M, q) into a sum of 
a d4-exact and a d4-coexact form: 


B; 一 dAÀp 1 F daTp+1 [29] 


(and likewise for C). Evidently, the exact (longitudinal) 
parts dAA of B and C do not appear in the action, and 
these are precisely the gauge-dependent parts of B and 
C under the gauge transformation [20]. Gauge fixing 
amounts to imposing a condition F(B,) — 0 on B, that 
determines the longitudinal part uniquely in terms of 
the transversal part d 7. A natural condition is 


A gauge-fixing condition independent of the partition 
function results from inserting *1" in the form of 


T: f dig]ó(F (B£)) Az (B) [31] 
G 


into the functional integral (the Faddeev-Popov 
trick), where G is the gauge group. This defines the 
Faddeev-Popov determinant Az, and the functional 
properties of the delta functional imply that A7 is 
the determinant of the operator that one obtains 
upon gauge variation of (B). 

In the general case of reducible gauge symmetries, 
the nature of the gauge group is complicated and 
requires some more thought. In the irreducible case, 
however, that is, for p — 1, the Lie algebra of the 
gauge group can be identified with 2°(M,q), and 
Ar is the determinant of the operator: 


— da : QP(M,a) — 2°(M, Q) [32] 


For [30], this is simply the Laplacian on 0-forms, 
and thus 


Az = det AY [33] 


The partition function Following the finite-dimen- 
sional model, both the ó-function implementing the 
gauge-fixing condition and the Faddeev—Popov 
determinant can be lifted into the exponential, the 
former by a Lagrange multiplier 7 [26], a Grassmann 
even 0-form, and the latter by a pair of Grassmann 
odd 0-forms c and č [28], the ghost and antighost 
fields, respectively. The sum of the classical action 
and these gauge-fixing and ghost terms defines the 
(BRST-invariant) *quantum action" S4 (n, p), and the 
partition function is 


Za (n, p) T" J de sn [34] 
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where ó denotes collectively all the fields. Concre- 
tely, when »=2 and p — 0 (or, equivalently, p = 1), 
the quantum action is 


$1 (2,0) = J BodAC; + ada * C1 +x Ae [35] 


Likewise, for n=3 and p=1 (the only other case 
when the gauge symmetry is indeed irreducible), 
both B, and C, require separate gauge fixing, and 
the quantum action is ` 


$1 (3,1) =| BydaCy + nda x C1 - ex Ac 


+ td, x By +e x AM? [36] 


Formally, therefore, the two-dimensional partition 
function is 


det A) 
Za(2,0) = Her D, [37] 
where D4 is the operator: 
d 
Da = ( "dA ) : Q! (M, q) 
xdA* 
— Q'(M, 8) © (M, g) [38 


One can define the determinant of this operator as 
the square root of the determinant of the operator 
D* D4 — AÍ}, and therefore the partition function 


Z4(2,0) = det AO (det A)? = Ty(2) [39] 


is equal to the two-dimensional Ray-Singer torsion 
[25]. In this case, it is easy to see directly that the 
even-dimensional Ray-Singer torsion is trivial, as 
one could have equally well defined the determinant 
of Da as the square root of the operator 
DaD* =A) @ AU. which implies Z4(2,0) — 1. 

In three dimensions, the two pairs of ghosts each 
contribute a det AM. and thus 


. (det A(Qy? 
Za(3,1) =F [40] 
where 
da d 
Da = f à Y : Q^ (M, g) © Q' (M, g) 
dax 0 
— Q* (M, g) © Q! (M, q) [41] 


is the operator acting on the fields (B1, C1, m, m). As 
before, this operator can be diagonalized by squar- 
ing it, D Da = A% @ A"), and thus 


Z4(3,1) = (det AU JS (der AP) 
=TA(3) 42] 
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is again related to the (this time genuinely nontrivial) 
Ray-Singer torsion. 

In spite of the complications caused by reducible 
gauge symmetries, it can be shown that all of the 
above generalizes to arbitrary n and p, with the 
result that (for n odd) 


Za(n,p) = Ta(n) " [43] 


confirming the topological nature of BF theories. 

In the nonabelian case, the situation is significantly 
more complicated because of the complexity of the 
classical moduli space, the (higher cohomology) zero 
modes, and the on-shell reducibility of the gauge 
symmetries. Nevertheless, ignoring all the zero modes 
except those of A, that is, except the moduli m of flat 
connections A(z), the result is similar to that in the 
abelian case, in that the partition function reduces to an 
integral over the moduli space of flat connections, with 
measure determined by the Ray-Singer torsion TAon)- 


Linking Numbers as Observables of Abelian 
BF Theories 


With the exception of p — 0, there are no interesting 
“local” observables (gauge-invariant functionals of the 
fields C and B) in the abelian BF theory, since the gauge- 
invariant field strengths dC and dB vanish by the 
equations of motion. (For p — 0, B is a gauge-invariant 
O-form and hence B(x) is a good local observable.) 
However, as in the Chern-Simons and Yang-Mills 
theories, certain (weakly) nonlocal observables such as 
Wilson loops are also of interest. In the case at hand (eqn 
[18]), we have abelian Wilson surface operators 


ws 四 = | B. wsld= j c [44] 


associated with p- and (nx — p — 1)-dimensional sub- 
manifolds $ and S’ of M, respectively. These operators 
are gauge invariant, that is, invariant under the local 
symmetries [20] provided that 0S = 0S’ — 0, so that S 
and S’ represent homology cycles of M. 

For M = R”, correlation functions of these opera- 
tors are related to the topological linking number of 
S and S. We choose $— 0X and $'—OY' to be 
disjoint compact-oriented boundaries of oriented 
submanifolds X and YX' of R”. We also introduce 
de Rham currents As and A; (essentially distribu- 
tional differential forms with ó-function support on 
X or S, respectively), characterized by the properties 


E =| As A wp 
S M 

f ern and j Ay A Wp+1 
£ JM 


for all w, € 2*(M, R) (and likewise for S’ and X/). 


[45] 


Since the dimension of X is equal to the codimen- 
sion of S’=0’, © and S' will generically intersect 
transversally at isolated points, and we define the 
"linking number" of S and S’ to be the intersection 
number of X and S', expressed in terms of de Rham 
currents as 


L(S,S) = / T i Asiy 146 
Jn JM 


In terms of de Rham currents, the Wilson surface 
operators can be written as Ws[B] — fy As ^ B, etc. 
Thus, the generating functional for correlation 
functions of Wilson surface operators 


(ei? Vs [B] pia Wey [cly 
= / D[C]D(B]e «94r CHAP 7 


is simply a Gaussian path integral. Using the 
defining properties of de Rham currents, this can 
be formally evaluated (using [27]) to give 


(gif Ws[Bl gia Wy [Cly — etiaßL(S,S') 48] 


As expected, correlation functions of these topolog- 
ical field theories encode topological information. 


Uses and Applications of Classical 
Nonabelian BF Theories 


Low-dimensional BF theories are closely related to 
other theories of interest, for example, the Yang- 
Mills theory, the Chern-Simons theory, and gravity. 
Here, we briefly review some of these relationships. 
In order to avoid the complexities of quantum 
nonabelian BF theories, we focus on their classical 
features. Brief suggestions for further reading are 
provided at the end of each subsection. 


Relation with Yang-Mills Theory 


In any dimension, the nonabelian BF action can be 
regarded as the zero-coupling limit g? — 0 of the 
Yang-Mills theory since the Yang-Mills action [17] 
can be written in first-order form as 


1 
m / traliB, 2 A Fa +e Baz ABa] [49] 
M 


However, whereas for n > 3 the B?-term breaks the 
p-form gauge invariance of the BF action (and thus 
liberates the physical Yang-Mills degrees of free- 
dom), this limit is nonsingular in two dimensions 
where this p-form symmetry is absent and, indeed, 
both theories have zero physical degrees of freedom. 


A nonsingular BF-like zero coupling limit of 
the Yang-Mills theory for n > 3 can be obtained 
by introducing an auxiliary (Stiickelberg) field 
nc 2%-3(M,q) which restores the p-form gauge 
invariance. The resulting BF Yang-Mills action is 


SBEYM = / trG lib ^ FA 
M 


tg CE s: tm) 
A ‘(Bn - 5t) [50] 


This action is not only invariant under ordinary G 
gauge transformations, but also under the p-form 
gauge symmetry B — B+ d4A [6] provided that 7 
transforms as 7 — n+ V2gA. Thus, this shift can be 
used to set 7 to zero, upon which one recovers the 
first-order form of the Yang-Mills action. More- 
over, in the zero-coupling limit all that survives is a 
standard (and nontopological) minimal coupling of 
n to the BF action: 


lim SBEYM 
8 一 0 


= / tr iB,» ^ Fa + tdan A *dan]| [51] 
M 


accounting for the correct number of degrees of 
freedom of the Yang-Mills theory (the (n — 3)-form 
n being absent for n= 2). 

Two-dimensional quantum BF and Yang-Mills 
theories have a variety of interesting topological 
properties. An account of some of them can be found 
in Blau and Thompson (1994) and Witten (1991). For 
a detailed discussion of the gauge symmetries and gauge 
fixing of the BFYM action, see Cattaneo et al. (1998). 


Chern-Simons Theory, Gravity, and (Deformed) 
BF Theory 


The Chern-Simons theory is à three-dimensional 
gauge theory. The Chern-Simons action for an 
H-connection C, H the gauge group, is 


Scs( C) — /un(CAdC+¥CA C C) [52] 


It is invariant under the infinitesimal gauge transforma- 
tions 6C = dc 和 A € 2°(M, 6), and the gauge-invariant 
equation of motion is the flatness condition Fc — 0. 
Now let H — TG be the tangent bundle group 
TG ~ G x,q. This is a semidirect product group 
with G acting on Q via the adjoint and à regarded 
as an abelian Lie algebra of translations. Thus, in 
terms of generators (J;,P;,), where the J, are 
generators of G, the commutation relations are 
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Veo dol = fie fa Pyl= .Ps and [Ps Py] =0, and 
the curvature of the TG-connection C = J, A? + P, B" is 


Fc = JaF4 + PadaB* [53] 


Thus, the equations of motion of the TG Chern- 
Simons theory are equivalent to the equations of 
motion [2] of the BF theory with gauge group G. 
This equivalence also holds at the level of the action: 


28cs(C) = Spr(A, B) [54] 


provided that one chooses the nondegenerate invar- 
iant scalar product to be 


trrG (JaP5) = treValo) 
trre (Jay) =trre(PaPs) = 0 


For G=SO(3), TG is the Euclidean group of 
isometries of R? and for G=SO(2,1), TG is the 
Poincaré group of isometries of the three-dimensional 
Minkowski space R”'. For these gauge groups, the BF 
action takes the form of the three-dimensional 
(Euclidean or Lorentzian) Einstein-Hilbert action, 
with the interpretation of B —e as the dreibein and 
A =w as the spin connection. The equations of motion 
for e and w express the vanishing of the torsion 
and the Riemann tensor (equivalent to the vanishing 
of the Ricci tensor for z— 3), respectively. This 
Chern-Simons interpretation of three-dimensional 
gravity extends to gravity with a cosmological 
constant, with H the appropriate de Sitter or anti-de 
Sitter isometry group (SO(4), SO(3, 1), or SO(2, 2), 
depending on the signature and the sign of the 
cosmological constarit). In terms of the BF interpreta- 
tion, this corresponds to the simple topological 
deformation 


[55] 


S Br (A, B) = | trc (B ^ FA 十 + uB ABA B) [56] 
JM 


of the BF action, which has the deformed local 
symmetries (cf. [5] and [6]) 


6A =dad+pn/B,X)], 6B=[B,N+daX [57] 


A simple way to understand these symmetries is to 
note that the action can be written as the difference 
of two Chern-Simons actions: 


Scs(A + MB) — Scs(A — V/uB) 
= 4VNSupF(A,B) [58] 


whose evident standard local gauge symmetries 
6(A + \/fiB)=dax ;gA^ are equivalent to [57] for 
AM-AX EX. 

A detailed account of three-dimensional classical 
and quantum gravity can be found in Carlip 
(1998). 
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Relation with Gravity 


Theories of two-dimensional gravity and topological 
gravity also have a BF formulation (Blau and 
Thompson 1991, Birmingham et al. 1991) which 
resembles the Chern-Simons BF formulation of 
three-dimensional gravity described above, the nat- 
ural gauge group now being SO(2, 1) or SO(3) or 
one of its contractions. 

In the first-order (Palatini) formulation, the 
Einstein-Hilbert action for four-dimensional gravity 
can be written as 


SEH = J tr(e AeA Fp) [59] 


where e is the vierbein and w is the spin 
connection. This action has the general form of a 
BF action with a constraint that B=e^e be a 
simple  bi(co-)vector. Thus, four-dimensional 
general relativity can be regarded as a constrained 
BF theory. Although this constraint drastically 
changes the number of physical degrees of freedom 
(BF theory has zero degrees of freedom, while 
four-dimensional gravity has two), this is never- 
theless a fruitful analogy which also lies at the 
heart of the spin-foam quantization approach to 
quantum gravity. This constrained BF description 
of gravity is also available for higher-dimensional 
gravity theories. 

For further details, and references, see Freidel et al. 
(1999) and the review article (Baez 2000). 


Knot and Generalized Knot Invariants 


The known relationship between Wilson loop 
observables of the Chern-Simons theory with 
a compact gauge group and knot invariants 
(Witten 1989), and the interpretation of the three- 
dimensional BF theory as a Chern-Simons theory 
with a noncompact gauge group raise the question of 
the relation of observables of an n= 3 BF theory to 
knot invariants, and suggest the possibility of using 
an n>4 BF theory to define higher-dimensional 
analogs of knot invariants. It turns out that an 
appropriate observable of n=3 BF theory for 
G=SU(2) is related to the Alexander-Conway 
polynomial. The analysis of higher-dimensional BF 
theories requires the full power of the Batalin- 
Vilkovisky (BV) formalism. BV observables general- 
izing Wilson loops have been shown to give rise to 
cohomology classes on the space of imbedded curves. 

For a detailed discussion of these issues, see 
Cattaneo and Rossi (2001) and references therein. 
A relation between the algebra of generalized 


Wilson loops and string topology has been investi- 
gated in Cattaneo et al. (2003). 


See also: Batalin-Vilkovisky Quantization; BRST 
Quantization; Chern-Simons Models: Rigorous Results; 
Gauge Theories From Strings; Knot Invariants and 
Quantum Gravity; Loop Quantum Gravity; Moduli 
Spaces: An Introduction; Nonperturbative and 
Topological Aspects of Gauge Theory; Schwarz-Type 
Topological Quantum Field Theory; Spin Foams; 
Topological Quantum Field Theory: Overview. 
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Introduction 


One of the sources of quantum groups is a 
bicrossproduct construction coming in the case of 
Lie groups from considerations of Planck-scale 
physics in the 1980s. This article describes these 
objects and their currently known applications. See 
also the overview of Hopf algebras which provides 
the algebraic context (see Hopf Algebras and 
q-Deformation Quantum Groups). 

The construction of quantum groups here is 
viewed as a microcosm of the problem of quantiza- 
tion in a manner compatible with geometry. Here 
quantization enters in the noncommutativity of the 
algebra of observables and “curvature” enters as a 
quantum nonabelian group structure on phase 
space. Among the main features of the resulting 
bicrossproduct models (Majid 1988) are 


1. Compatibility takes the form of nonlinear *matched 
pair equations" generically leading to singular 
accumulation regions (event horizons or a max- 
imum value of momentum depending on context). 

2. The equations are solved in an “equal and 
opposite" form from local factorization of a 
larger object. 

3. Different classical limits are related by observer- 
observed symmetry and Hopf algebra duality. 

4. Nonabelian Born reciprocity re-emerges and is 
linked to T-duality. 


It has also been argued that noncommutative 
geometry should emerge as an effective theory of the 
first corrections to geometry coming from any 
unknown theory of quantum gravity. Concrete 
models of noncommutative spacetime currently 
provide the first framework for the experimental 
verification of such effects. The most basic of these 
possible effects is curvature in momentum space or 
“cogravity.” We start with this. 


Cogravity 


We recall that curvature in space or spacetime 
means by definition noncommutativity among the 
covariant derivatives D;. Here the natural momenta 
are p; — —1bD; and the situation is typified by the 
top line in Figure 1. There are also mixed relations 
between the D; and position functions as indicated 
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Position Momentum 


Noncommutative 
[D;, pj] =iħyEikPk 


Curved 


Cogravity Noncommutative 


[Xj, Xj] = 21 €i Xk 


* Quantum 


mechanics [x;, pj] = ihój 


Figure 1 Noncommutative spacetime means curvature in 
momentum space. The equations are for illustration. 


for flat space in the bottom line, which is quantum 
mechanics (there is a similar story for quantum 
mechanics on a curved space). We see however a 
third and dual possibility — noncommutativity in 
position space which should be interpreted as 
curvature in momentum space, that is, the dual of 
gravity. This is an independent physical effect and 
comes therefore with its own length scale which we 
denote A. These ideas were made precise in the mid 
1990s using the quantum group Fourier transform; 
see Majid (2000). Here we show what is involved on 
three illustrative examples. 


1. We consider the “spin space" algebra 
R3 : [xi, xj] = i2Aej" x, 


where e172” — 1 and where it is convenient to insert a 
factor 2. This is the enveloping algebra U(su;), that 
is, just angular momentum space but now regarded 
“upside down” as a coordinate algebra (see Hopf 
Algebras and q-Deformation Quantum Groups). 
Then a plane wave is of the form 


Wp = er 


where we set h = 1 for this discussion. The momenta 
p; are nothing but local coordinates for the 
corresponding point e'4?? e SU; where Ac is the 
representation by Pauli matrices. It is really elements 
of this curved space SU; where momenta live. Here 
Ri-U(su;) has dual C[SU2] and Hopf algebra 
Fourier transform (after suitable completion) takes 
one between these spaces. Thus, in one direction 


£f()- | duf(ujux J d'pJ(p)f (p) e”* 


pcR? 


SU» 


for f a function on SU;. We use the Haar measure on 
SU». The local result on the right has J the Jacobian 
for the change to the local p coordinates and f is 
written in terms of these. Note that the coproduct in 
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C[SU2] in terms of the p’ generators is an infinite 
series given by the Campbell-Baker-Hausdorff series, 
and not the usual linear one (this is why the measure 
is not the Lebesgue one). The physical content here is 
in the plane waves themselves, one can use any other 
momentum coordinates to parametrize them with the 
corresponding measure and coproduct. Differential 
operators on R3 are given by the action of elements of 
C[SU»] and are diagonal on these plane waves, 


f Ap — f(p D)Up 


which corresponds under Fourier transform simply 
to pointwise multiplication in C[SU5]. For example, 
the function A? (tr — 2) as a function on SU; will 
give a rotationally invariant wave operator which is 
also invariant under inversion in the group. Its value 
on plane waves is 


2 
X (cos(A|p|) — 1) 


In the limit 入 一 0 this gives the usual wave operator 
on R. 

It is also possible to put a differential graded 
algebra (DGA) structure of differential forms on this 
algebra, the natural one being 


1 iAp-o 
ite Ape. 4) — 


2 


À 
dx; = dj, x;0 — 0x; = m dx; 


(dx;)x; — xjdx; = iA dx + ip6;;0 


where @ is the 2 x 2 identity matrix which, together 
with the Pauli matrices o;, completes the basis of 
left-invariant 1-forms. The 1-form @ provides a 
natural time direction, even though there is no time 
coordinate, and the new parameter u Z 0 appears as 
the freedom to change its normalization. The partial 
derivatives Ó' are defined by 


d(x) = (0'v)dx; + (09v)8 


and act diagonally on plane waves as 


I^ sin(A|p|) 
while 0° — iu(tr — 2)/2A? is computed as above. 
Note that jj cannot be taken to be zero due to an 
anomaly for translation invariance of the DGA. It is 
in fact a typical feature of noncommutative differ- 
ential geometry that there is a 1-form 0 generating d 
by commutator which can be required as an extra 
cotangent direction with its associated partial 
derivative an induced Hamiltonian. In the present 
model we have 


Ow = 52.9 


TS. l 
8 = t(0i( )) = 


^v + OY 


which is of the form of Schrédinger’s equation with 
respect to an auxiliary time variable and for a 
particle with mass 1/y. 

The reader may ask what happens to the 
Euclidean group of translations and rotations in 
this context. From the above we find that 
U)(poine;) = C[SUz]><1U(suz), the semidirect pro- 
duct generated by translations 0’ and usual rota- 
tions. This in turn is the quantum double D(U(su2)) 
of the classical enveloping algebra, and as such a 
quantum group with braiding etc. (see Hopf 
Algebras and g-Deformation Quantum Groups). 
This quantum double has been identified as part 
of an effective theory in 2+ 1 quantum gravity in a 
Euclidean version based on Chern-Simons theory 
with Lie algebra poinc, and the spin space algebra 
proposed as an effective theory for this. The 
quotient of R3 by an allowed value of the quadratic 
Casimir x? (which then makes it a matrix algebra) 
is called a “fuzzy sphere” and appears as a “world- 
volume algebra" in certain string theories and 
reduced matrix models. The noncommutative dif- 
ferential geometry that we have described is due to 
Batista and the author. 

2. We take the same type of construction to 
obtain the “bicrossproduct model” spacetime 
algebra 


RI" : EEA = JAN ETEA =Q 


These are the relations of a Lie algebra b, (say) but 
again regarded as coordinates on a noncommutative 
spacetime. Here A is a timescale which can be 
written as a mass scale &—1/A instead. We 
parametrize the plane waves as 

ipx eip t 


Wp po =g Wy po Vp po = Vp ie pr popi 


which identifies the p^ as the coordinates of the 
nonabelian group B, — RR? with Lie algebra 
b,. The group law in these coordinates is read off 
as usual from the product of plane waves, which 
also gives the coproduct of C[B,] on the p^". We 
have parametrized plane waves in this way 
(rather than the canonical way by the Lie algebra 
as before) in order to have a more manage- 
able form for this. We do pay a price that in these 
coordinates group inversion is not simply —fp", 


but 


(p.p*) = (=e p, —p?) 


which is also the action of the antipode S on the 
abstract p^ generators. 

In particular, the right-invariant Haar measure on 
B, in these coordinates is the usual d*p so the 


NW din 4 "^ — ACE ULM 
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quantum group Fourier transform reduces to the 
usual one but normal ordered, 

2 1 va) 

F(f)=] dp f(pe**e?" 


R^ 


(gné can also Fourier transform with respect to the left- 
invariant measure d^p e?" on B, ). The inverse is again 
given in terms of the usual inverse transform if we 
specify general fields v» in RP? by normal ordering of 
usual functions, which we shall do. As before, the action 
of elements of C[B., | defines differential operators on 
RY? and these act diagonally on plane waves. 


We also have a natural DGA with 


(ilies ie, e ead (dt)x,, —x,dt = iAdx, 


which leads to the partial derivatives 


i, HE a FN am tet a 
On =: en ee 


Aap =: W(x, t 1A) — Vix, t) - ia — g^) a) 
1 入 入 
for normal-ordered polynomial functions w or in 
terms of the action of the coordinates p^ in C[B , ]. 
These ð” do respect our implicit -*-structure 
(unitarity) on RC but in a Hopf algebra sense 
which is not the usual sense, since the action of the 
antipode S is not just —p^. This can be remedied by 


using adjusted derivatives L-' /?0^ where 
Ly =: p(x, t +ià):= e p 


In this case the natural 4D Laplacian is LA — 
37; (05)^), which acts on plane waves as 


一 E (cosh(Ap°) 一 1) + pe" 


where i 
2 
p =) pa 
i1 


This deforms the usual Laplacian in such a way as to 
remain invariant under the Lorentz group (which now 
acts nonlinearly on B, in this model) and under group 
Inversion. 

This model may provide the first experimental test 
for noncommutative spacetime and cogravity. For the 
analysis of an experiment, we assume the identification 
of noncommutative waves in the above normal-ordered 
form with classical ones that a detector might register. 
In that case one may argue (Amelino-Camelia and 
Majid 2000) that the dispersion relation for such waves 
has the classical derivation as Op°/Op' which now 
computes as propagation speed for a massless particle: 


Op? 
dp 


e" 


in units where 1 is the usual speed of light. So 
the prediction is that the speed of light depends 
on energy. What is remarkable is that even if 
à~ 107*^s (the Planck timescale), this prediction 
could in principle be tested, for example using y-ray 
bursts. These are known in some cases to travel 
cosmological distances before arriving on Earth, and 
have a spread of energies from 0.1-100 MeV. 
According to the above, the relative time delay A, 
on traveling distance L for frequencies correspond- 
ing to p°, p? + Apo is 


A, ~ Ag 2 ~ 107 "5s x 100 MeV x 10!°y ~ 1 ms 


which is in principle observable by statistical 
analysis of a large number of bursts correlated 
with distance (determined, e.g., by using the Hubble 
telescope to lock in on the host galaxy of each 
burst). Although the above is only one of a class of 
predictions, it is striking that even Planck-scale 
effects are now in principle within experimental 
reach. 

We now explain what happens to the full 
Poincaré symmetry here. The nonlinear action of 
the Lorentz group on B, Fourier transforms to an 
action on the generators of RO, which combines 
with the above action of the p^ to generate an entire 
Poincaré quantum group U(so;,3 )><C[B,]. We will 
say more about its *bicrossproduct" structure in a 
later section. The above wave operator in momen- 
tum space is the natural Casimir in these momentum 
coordinates. A common mistake in the literature for 
this model is to suppose that the Casimir relation 
alone amounts to a physical prediction, whereas in 
fact the momentum coordinates are arbitrary and 
have meaning only in conjunction with the plane 
waves that they parametrize. The deformed Poincaré 
as an algebra alone is actually isomorphic to the 
undeformed one by a different choice of generators, 
so by itself has no physical content; one needs rather 
the noncommutative spacetime as well. Prior work 
on the relevant deformed Poincaré algebra either did 
not consider it acting on spacetime or took it acting 
on classical (commutative) Minkowski spacetime 
with inconsistent results (there is no such action as a 
quantum group). 

The above model was introduced by Majid 
and Ruegg (1994) and later tied up with a dual 
approach of Woronowicz. There is also a previous 
“«-Poincaré” version of the Hopf algebra alone 
obtained (Lukierski et al. 1991) in another context 
(by contraction of U,(so2,3)) but with fundamentally 
different generators and relations and hence 
different physical content (e.g., the Lorentz 
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generators there do not close among themselves but 
mix with momentum). 

3. The usual Heisenberg algebra of quantum 
mechanics is another possible noncommutative 
(phase) space; one may also take the same algebra 
and view it as a noncommutative spacetime, so: 


Rj^: [x,,x] = i8, 


for any antisymmetric tensor 0,,. This is not a 
Hopf algebra but it turns out that this model can 
also be completely solved by Hopf algebra meth- 
ods, namely the theory of covariant twists. Twist 
models also include versions of the noncommuta- 
tive torus studied by Connes, and related 0-spaces, 
which are nontrivial at the level of C*-algebras. 
However, at an algebraic level, all covariant 
structures are automatically provided by applying 
the twisting functor 7 to the desired classical 
construction (see Hopf Algebras and q-Deformation 
Quantum Groups). This is not usually appreciated in 
the physics literature on such models, but see Oeckl 
(2000). 

Thus, consider H = U(R^?) with generators p^ = 
—ið” acting as usual on functions on Minkowski 
space. It has a cocycle 


F = eü/ Dr" 2p w 


which induces a new product e on functions by 
Óewu-—-.(Fl(ócw). This is just the standard 
Moyal product, in the present case on R'?, viewed 
as a covariant twist using Hopf algebra methods. 
The Hopf algebra U(R'?) in principle has a twisted 
coproduct given by Ag-—F(A())F but this does 
not change as the algebra is commutative. 

Next, H also acts covariantly on Q(R^?), the 
usual algebra of differential forms, and twisting this 
in the same way gives 


W(x) e dx, - wddx, - (dx) = (dx,,) ey 


unchanged. This is because no terms higher than 
p" ® p"0,, contribute and then d(1)=0. The asso- 
ciated partial derivatives defined by d are likewise 
unchanged and act in the usual way as derivations 
with respect to both the e product and the 
undeformed product. The result may look different 
when the same w(x) is expressed as a function of the 
variables with the e product. In other words, the 
only deformation comes from the Moyal product 
itself, with the rest being automatic. Moreover, the 
plane- waves themselves are unchanged because 
(x-k) —(x-k)' due to 0 being antisymmetric. 
Hence, 


p'vy(x) = k'vy (x) 


where p^"— —iO". The wave operator —0,0" is 
therefore given by the action of p,p” and has value 
k,,k" as usual on plane waves. On the other hand, 


i/2)k" "8 
Wr @ Wp = e/ ) MAD, k 


or in algebraic terms the twist functor 7 applied 
to the Fourier transform implies also a twisted 
coproduct or coaddition law for the abstract A" 
generators, now different from the linear one for the 
covariance momentum operators p^. This leads to 
some of the more interesting features of the model. 

One immediately also has a Poincaré quantum 
group here, Us(poinc, 3), obtained by similarly 
twisting the classical U(poinc, 3). We just view 
F as living here rather than in the original H. The 
translation sector is unchanged as before but if M^? 
are the usual Lorentz generators, then 


ApM?? - Me & 1 +18 Ma6 
J j (p^ & BF up" m 0? p" & p^) 
i (p^ &) 0^ p" = 0^ ap” e p?) 


using the metric nuy to raise or lower indices. The 
antipode is also modified according to the theory 
in Majid (1995). The relations in the Poincaré 
algebra are not modified (so, e.g., p,p" will 
remain central). Any construction originally Poin- 
caré covariant becomes covariant under this 
twisted one after application of the twisting 
functor. As with the differentials above, the 
action on RP? is not actually modified but may 
appear so when functions are expressed in terms 
of the e product. 

The above model is popular at the time of 
writing in connection with string theory. Here, an 
effective description of the endpoints of open 
strings landing on a fixed 4-brane has been 
modeled conveniently in terms of the e product 
above (Seiberg and Witten 1999). It should be 
borne in mind, however, that this fixed 4-brane 
lives in some of the higher dimensions of the string 
spacetime, so this is not necessarily a prediction of 
noncommutative spacetime R^. 

In fact, a proposal superficially similar to RI? 
above was already proposed in Snyder (1947). 
Here 


Ix" oe) = iA? Me” 


where A is our length scale and the MA are now 
operators with the usual commutation rules for the 
Lorentz algebra with themselves and with x^ and the 
momenta p". The latter obey 


p^] =i” — pp"), ip^ p^] 0 
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so the entire Poincaré algebra is undeformed but the 
phase-space relations are deformed. Snyder also 
constructed the orbital angular momentum realiza- 
tion M^" = xp" — x"p". This model is not a propo- 
sal for a noncommutative spacetime because the 
algebra does not even close among the x". Rather it 
is a proposal for *mixing" of position and Lorentz 
generators. On the other hand (which was the point 
of view in Snyder (1947)), in any representation of 
the Poincaré algebra, the M^" become operators and 
in some sense numerical. The rotational sector has 
discrete eigenvalues as usual, so to this extent the 
spacetime has been discretized. Although not fitting 
into the methods in this article, it is also of interest 
that the relations above were motivated by con- 
sidering p^ as coordinates projected from a 5D flat 
space to de Sitter space and x^ as the 5-component 
of orbital angular momentum in the flat space. 

To conclude this section, let us note that there are 
further models that we have not included for lack of 
space. One of them is a much-studied R}? in which 
t is central but the x; enjoy complicated. q-relations 
best understood as q-deformed Hermitian matrices. 
One of the motivations in the theory was the result 
in Majid (1990) that q-deformation could be used to 
regularize infinities in quantum field theory as poles 
at q— 1. Another entire class is to use noncommu- 
tative geometry and quantum group methods on 
finite or discrete spaces. Unlike lattice theory where 
a finite lattice is viewed as approximation, these 
models are not approximations but exact noncom- 
mutative geometries valid even on a few points. The 
noncommutativity enters into the fact that finite 
differences are bilocal and hence naturally have 
different left and right multiplications by functions. 
Both aspects are mentioned briefly in the overview 
article (see Hopf Algebras and q-Deformation 
Quantum Groups). Also, on the experimental 
front, another large area that we have not had 
room to cover is the prediction of modified 
uncertainty relations both in spacetime and phase 
space (Kempf et al. 1995). 

Moreover, for all of the models above, once one 
has a noncommutative differential calculus one may 
proceed to gauge theory etc., on noncommutative 
spacetimes, at least at the level where a connection 
is a noncommutative (anti-Hermitian) 1-form a. 
Gauge transformations are invertible (unitary) 
elements u of the noncommutative “coordinate 
algebra” and the connection and curvature trans- 
form as 


ao—ulou-cu!du 


F(a) = da +a ^a u !F(o)u 


The full extent of quantum bundles and gravity 
(see Quantum Group Differentials, Bundles and 
Gauge Theory) and quantum field theory is not 
always possible, although both have been done for 
covariant twist examples (for functorial reasons) 
and for small finite sets. For the first two models 
above, for example, it is not clear at the time of 
writing how to interpret scattering when the addi- 
tion of momenta is nonabelian. 


a 


Matched Pair Equations 


Although we have presented noncommutative space- 
time first, the first actual application of quantum 
group methods to Planck-scale physics was the 
Planck-scale Hopf algebra obtained by a theory of 
bicrossproducts. Like the Snyder model, the inten- 
tion here was to deform phase space itself, but since 
then bicrossproducts have had many further appli- 
cations. The main ingredient here is the notion of a 
pair of groups (G, M), say, acting on each other as 
we explain now. The mathematics here goes back to 
the early 1910s in group theory, but also arose in 
mathematical physics as a toy version of Einstein’s 
equation in the sense of compatibility between 
quantization and curvature (see the next section). 

By definition, (G, M) are a matched pair of 
groups if there are left and right actions 


M+- Mx GG 
of each group on the set of the other, such that 


Sce =s EHS SSi , BCHM-—ÉE 
(s<ju)<jv = s<{(uv), sœ (tu) = (st)bu 
sœ (uv) = (s>u)((s<du) bv) 
(st) «Iu = (s<{(tu)) (tu) 


for all u,v € G,s,t € M. Here e denotes the relevant 
group unit element. As a first application of such 
data, one may make a *double cross product group" 
G ra M with product 


(u,s).(v,t) = (u(sbv), (s«qv)t) 


and with G, M as subgroups. Since it is built on the 
direct product space, the bigger group factorizes into 
these subgroups. Conversely, if X is a group 
factorization such that the product G x M— X is 
bijective, each group acts on the other by actions 
>œ, < defined by su = (sD»u)(s«1u) for u € G and s € 
M, where s, u are multiplied in X and the product is 
factorized as something in G and something in M. 
So finite group matched pairs are equivalent to 
group factorizations. In the Lie group context, the 
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corresponding system of differential equations is 
equivalent to a local factorization. 

There is a nice graphical representation of the 
matched pair conditions which relates to "surface 
integration." Thus, consider squares 


ups 


s| | sq 


u 


labeled by elements of M on the left edge and 
elements of G on the bottom edge. We can fill in the 
other two edges by thinking of an edge transformed 
by the other edge as it goes through the square either 
horizontally or vertically, the two together is the 
surface transport — across the square. The matched 
pair equations have the meaning that a square can 
be subdivided either vertically or horizontally as 
shown in Figure 2, where the labeling on vertical 
edges is to be read from top down. The transport 
operation here is nothing other than normal order- 
ing in the factorizing group. In the Lie setting, it 
means that the equations can be solved from 
infinitesimal solutions (a matched pair of Lie 
algebras) by a simultaneous double integration over 
the group (i.e., building up a large box from many 
small ones). If one considers solving the quantum 
Yang-Baxter equations on groups, they appear in 
this notation as an equality of surface transport 
going two ways around a cube, and the classical 
Yang-Baxter equations as curvature of the under- 
lying higher-order connection. 

Also in this notation there is a bicrossproduct 
quantum group defined in Figure 3, at least when M 
is finite. The expressions are considered zero unless 
the juxtaposed edges have the same group labels. In 
that case, the product is a semidirect product 
algebra C(M)><1CG of functions on M by the 
group algebra of G. The coproduct is the adjoint of 


(st)>u s>(tpu) 
fo (st)au — 
u u 
s>(uv) sou (sau)v 


o 
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A 
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Figure 2 Matched pair condition as a subdivision property. 


TPF 


Figure 3 Bicrossproduct Hopf algebra showing horizontal 
product and vertical coproduct as an “unproduct.” 


this, so is a semidirect coalgebra C(M)><CG. Hence 
the two together are denoted C(M)®<iCG. The dual 
needs G finite and has the same form but with 
vertical and horizontal compositions interchanged, 
that is, a bicrossproduct CMp><C(G). Both Hopf 
algebras have the above labeled squares as basis. 

It is possible to generalize both bicrossproducts 
and double cross products associated to matched 
pairs to general Hopf algebras Hı><Hı and 
Hip-H», respectively, where Hj,H; are Hopf 
algebras (see Majid 1990) and to relate the two in 
general by dualization of one factor. Another 
general result (Majid 1995) is that H,m<jH> acts 
covariantly on the algebra Hj from the right, or 
H\<4H> acts covariantly on H5 from the left. A 
third general result is that bicrossproducts solve the 
extension problem 


Hı => HH»; 


meaning that such a Hopf algebra H subject to some 
technical requirements (such as an algebra splitting 
map H;-H) is of the form H c Hj»«H;. The 
theory was also extended to include cocycle bicros- 
sproducts at the end of the 1980s (by the author). 
The finite group case, however, was first found by 
Kac and Paljutkin (1966) in the Russian literature 
and later rediscovered independently in Takeuchi 
(1981) and in the course of Majid (1988). 


The Planck-Scale Hopf Algebra 


We consider a quantum algebra of observables H 
and ask when it is a Hopf algebra extending some 
classical position coordinate algebra C[M] and some 
possibly noncommutative momentum coordinate 
algebra U(g) in the form of a strict extension 


C[M] ^ H — U(g) 


From the theory above this problem is governed by local 
solutions of the matched pair equations on (G, M). It 
requires that H = C|[M] >< U(g) as an algebra, that is, 
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the quantization of a particle moving on orbits in M 
under some action of G (in an algebraic setting, or 
one can use von Neumann or C*-algebras etc.). And 
it requires the classical phase space to be a 
nonabelian or “curved” group M><g’. This extends 
to a coproduct on H which becomes the bicross- 
product Hopf algebra C[M ]p-aU(g). In this way, the 
problem which was open at the start of the 1980s of 
finding true examples of Hopf algebras was given a 
physical interpretation as being equivalent to finding 
quantum-mechanical systems reconciled with curva- 
ture, and the equations that governed this were the 
matched pair ones (Majid 1988). 

We still have to solve these equations. In the 
Lie case, they mean a pair of cross-coupled first- 
order equations on G x M. These can be solved 
locally as a double-holonomy construction in line 
with the surface transport point of view, but are 
nonlinear typically with singularities in the non- 
compact case. The equations are also symmetric 
under interchange of G, M so Born reciprocity 
between position and momentum is extended to 
the quantum system with generally “curved” 
position and momentum spaces. Moreover, in so 
far as Einstein’s equation G,,,=87T;, is also a 
compatibility between a quantity in position 
space and a quantity originating (ultimately) in 
momentum space, the matched pair equations can 
be viewed as a toy version of these. 

Let us note that the reason to look for H a Hopf 
algebra in the first place, aside from the reasons 
already given, is for observer-observed symmetry 
(this was put forward as a postulate for Planck-scale 
physics). Thus, H* is also an algebra of observables 
of some dual system, in our case U(n)r4C[G] or 
particles in G moving on orbits under M. Thus, 
Born reciprocity is truly implemented in the 
quantum/curved system by Hopf algebra duality. 
Put another way, Hopf algebras are the simplest 
objects after abelian groups that admit Fourier 
transform (see Hopf Algebras and q-Deformation 
Quantum Groups) and we require this on phase 
space if Born reciprocity is to be extended to the 
quantum/curved system. 

The Planck-scale Hopf algebra is the simplest 
example of these ideas (Majid 1988). Here G= 
M — R and the matched pair equations can be solved 
completely. The general solution is 


E 
Op 


— 


p=ihti-e*)2, —£-l1( e) 


Ox’ b 


for the action of one group with generator p on 
functions of x in the other group and vice-versa. It 
has two parameters which we have denoted as 5 and 


a background curvature scale y, and the correspond- 
ing bicrossproduct C[p|><C[x] is 
[p, x] - ib(1 — e 7*), 
Ap — p Ge *+1 © D, 
Sp = —pe™ 


Ax=x@1+1@®x 
ex = ep = 0 


Sx ——x, 


where we should allow power series or take em as 
an invertible generator. 

It is important to note that the matched pair 
equations here have only this solution and it is 
necessarily singular at p=0 or x=0. The inter- 
pretation in position space is as follows. Consider an 
infalling particle of mass m with fixed momentum 
p=mv, (in terms of the velocity at infinity). By 
definition, p is the free-particle momentum and acts 
on R as above. This corresponds to a free-particle 
Hamiltonian p*/2m and induces 


p=0 
NOUIS NUR a NN 
m e 1 十 YX 十 … 


at the classical level. We see that the particle takes 
an infinite time to reach the origin, which is an 
accumulation point. This can be compared with the 
formula in standard radial infalling coordinates 


1 
PEER ee 
( l4 x) 


for distance x from the event horizon of a black hole 
of mass M (here G is Newton's constant and c the 
speed of light). So y~ c?/GM and for the sake of 
further discussion we will use this value. With a 
little more work, one can then see that 


mM < ms 


C[x] C[p]usual qu. mech. 
C[x]p- C[p] ^ 
j^ C(X)usual curved geometry 


mM > ms 


where mp is the Planck mass of the order of 10? g 
and X =Rp<R is a nonabelian group. In the first 
limit, the particle motion is not detectably different 
from usual flat space quantum mechanics outside 
the Compton wavelength from the origin. In the 
second limit, the estimate is such that noncommu- 
tativity would not show up for length scales much 
larger than the background curvature scale. 

This Hopf algebra is also the simplest way to 
extend classical position C[x] and momentum C{[p] 
in the sense above. In other words, requiring to 
maintain observer-observed symmetry or Born 
reciprocity throws up both quantum mechanics (in 
the form of h) and something with the flavor of 


272 Bicrossproduct Hopf Algebras and Noncommutative Spacetime 


gravity (in the form of y) and both are required for a 
nontrivial Hopf algebra. Moreover, the construction 
necessarily has a self-dual form and indeed the 
dually paired Hopf algebra is C[p]b-4«C [x] with new 
parameters b'—1/b and 7=hy if we take the 
standard pairing x,p across the two algebras. Hopf 
algebra duality realized by the quantum group 
Fourier transform F takes one between the two 
models. 


Bicrossproduct Poincaré 
Quantum Groups 


Another example from the 1980s in the same family 
as the Planck-scale Hopf algebra is G— SU; and 
M — B,, a nonabelian version of R? with Lie algebra 
b, of the form 


[3,5] = EX; Bni] = 0 


for i— 1,2. The required solution of the matched 
pair equations was found in Majid (1990) and has a 
nonlinear action of rotations on B,. The interpreta- 
tion of C[B , |]p-aU(su;) is of particles moving along 
orbits which are deformed spheres in B,, and there 
is a dual model where particles move instead on 
orbits in SU; under the action of b,. Moreover, 
from the general theory of bicrossproducts, we 
automatically have a covariant action of C|B,|»< 
U(su;)) on the auxiliary noncommutative space 
R3 = U(b,) with relations as above. 

The quantum group here was actually obtained as a 
Hopf-von Neumann algebra but we limit ourselves to 
the underlying algebraic version. Also, there is of 
course nothing stopping one considering this Hopf 
algebra equally well as U)(poinc;), that is, a deforma- 
tion of the group of motions on R?, rather than as an 
algebra of observables. The only difference is to denote 
the generators of C[B,, | by the symbols p’, reserving x; 
instead for the auxiliary noncommutative space. We 
lower i,j,k indices using the Euclidean metric. Then 
the bicrossproduct has the form 


Ibi. j| = 9, 
[M3, pj] = iea Dx, 


as usual, for 1,7= 1,2,3, and the modified relations 


[M;, M] = ie; M, 
[Mi, ps] = ies" p, 


i 1—e-2: "A 
[Mi, pj] = ju (= — v?) + ire pip, 
for i,j=1,2 and p? =p; + p5. The coproducts are 
AM; = M; 8e ^ + AM; @p; --1& Mj 
Ap; = pi 8e ^^ 1G pi 


for i— 1,2 and the usual additive ones for p3, M3. 
There is also an appropriate counit and antipode. 
The deformed spheres under the nonlinear rotation 
in Majid (1990) are constant values of the Casimir 
for the above algebra. This is 


Š (cosh(ps) = 1) + pe? 

which from the group of motions point of view 
generates the noncommutative Laplacian when 
acting on R3. The model here is a Euclidean 
inhomogeneous one. 

The four-dimensional (4D) version U(so;,3)>< 
C[B,] of this construction (Majid and Ruegg 
1994) is again linked to Planck-scale predictions, 
this time as a generalized symmetry. In terms of 
translation generators p", rotations M; and boosts 
N; we have 


p^p']-0,  [M;, M;] = ic; M 
[N,Nj-— -iej Mp, [M Nj] = ie; Nk 
p^ M]-0,  [p.Mj]-iéap*,  [p^Nj- ip; 


as usual, and the modified relations and coproduct 
. 1 E —AAp" | | 
PN] =-56 (== + w») + idp'p; 


AN; =N; @1 +e?" @N; + A€jup! & My 
Ap’ =p! @1i+ e M &p 


and the usual additive coproducts on p?, M;. This 
time the Lorentz group orbits in B, are deformed 
hyperboloids rather than deformed spheres, and the 
Casimir that controls this has the same form as 
above but with — in the cosh term, that is, the 
model is a Lorentzian one. We know from the 
general theory of bicrossproducts that this Hopf 
algebra acts on U(b,) -Rj"^ the spacetime in the 
section “Cogravity,” and the Casimir induces the 
wave operator as we have seen there. 

Let us look a bit more closely at the deformed 
hyperboloids. Because neither group here is com- 
pact, one expects from the general theory of 
bicrossproducts to have limiting accumulation 
regions. This is visible in the contour plot of p? 
against |p| in Figure 4, where the p? > 0 mass shells 
are now cups with almost vertical walls, compressed 
into the vertical tube 


lp) « X^ 


In other words, the 3-momentum is bounded above 
by the Planck momentum scale (if A is the Planck 
time). Indeed, the light-cone equation (setting the 
Casimir to zero) reads A|p|-C 1 — e??? so this is 
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Figure 4 Deformed mass-shell orbits in the bicrossproduct 
curved momentum space for A — 1. 


immediate. Nevertheless, this observation is so 
striking that the bicrossproduct model has been 
dubbed “doubly special” and spawned the search for 
other such models. Such accumulation regions are a 
main discovery of the noncompact bicrossproduct 
theory visible already in the Planck-scale Hopf 
algebra. The model further confirms the role of 
the matched pair equations as a toy version of 
Einstein’s. 


Poisson-Lie T-Duality 


We have explained in Section 3 that the matched 
pair equations are equivalent to a local factorization 
of Lie groups, with the action and back-reaction 
created “equally and oppositely” from this. For the 
two models in the last section, these are SL2(C) 
factorizing as SU? and a 3D B,, and SO»; locally as 
SO1.3 and a 4D B,. The first of these examples is in 
fact one of a general family based on the Iwasawa 
decomposition Gc = G><IG* where G is a compact 
Lie group with complexification Ge and G* a 
certain solvable group. From this, one may construct 
a solution (G, G*) of the matched pair equations and 
bicrossproduct quantum group 


C[G" e 3U(g) 


associated to all complex simple Lie algebras. This is 
again part of the bicrossproduct theory from the 
1980s. On the other hand, the Lie algebra g* here 
can be identified with the dual of g in which case its 
Lie algebra corresponds to a Lie coproduct 
6:g—g@g and makes (g,6) into a Lie bialgebra in 
the sense of Drinfeld. This ó exponentiates to a 
Poisson bracket on G making it a “Poisson—Lie 


group" and the quantization of this is provided 
by the quantum group coordinate algebras C,[G] 
(see Hopf Algebras and g-Deformation Quantum 
Groups and Classical r-matrices, Lie Bialgebras, and 
Poisson Lie Groups). The bicrossproduct quantum 
groups are nevertheless unrelated to the latter even 
though they spring form related classical data. 

As already discussed, one interpretation here is 
of quantized particles in G* moving on orbits 
under G and in vice versa in the dual model. The 
dual model is equivalent in the sense that the 
states of one (in the sense of positive-linear 
functionals) lie in the algebra of observables of 
the other and we also saw in the Planck-scale 
example inversion of structure constants reminis- 
cent of T-duality in string theory. Motivated in 
part by this duality Klimcik (1996) along with 
Severa in the mid 1990s showed that indeed a 
o-model on G could be constructed in such a way 
that there was a matching dual o-model on G* in 
some sense equivalent in terms of solutions to the 
equations of motion. The Lagrangians here have 
the usual form 


L=E,(u'd,u,u-'d_u), 
Ê = E,(s!8,s,s 10. s) 


where 4: R^! — G and s: R^! — G* are the dyna- 
mical fields, except that the inner products E,E 
are not constant. Rather they are obtained by 
solving nonlinear differential equations on the 
groups defined through the structure constants 
of g,g* and the Drinfeld double D(g). At the time, 
T-duality here was well understood in the case of 
abelian groups while these Poisson-Lie T-duality 
models provided the first convincing nonabelian 
models. 

This construction was extended by Beggs and 
Majid (2001) to a general matched pair (G, M), that 
is, a o-model on G dual to one on M. The Poisson- 
Lie case is the special case where the actions are 
coadjoint actions and the Lie algebra of GD<IM is 
D(g). The solutions of the equations of motion for 
the two systems are created “equally and oppo- 
sitely" from one on the factorizing group. It could 
be expected that T-duality ideas again play a role in 
Planck-scale physics. 


Other Bicrossproducts 


There are also infinite-dimensional factorizations 
such as the Riemann-Hilbert problem (see 
Riemann-Hilbert Problem) in the theory of 
integrable systems and hence infinite-dimensional 
matched pairs and bicrossproducts linked to 
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them. Here we mention just one partly infinite 
example of current interest. 

Thus, the diffeomorphisms on the line R may be 
factorized into transformations of the form ax + b 
and diffeomorphisms that fix the origin and have 
unit differential there. After a (logarithmic) change 
of generators to arrive at an algebraic picture, one 
has a bicrossproduct 


H(1) = U(6, Ja 万 


where b, is now the two-dimensional (2D) Lie 
algebra with relations [x, y] =x and H» is the algebra 
of polynomials in generators 6, and a certain 
coalgebra as a model of the coordinate algebra of 
the group of diffeomorphisms that fix the origin with 
unit differential. The Hopf algebra H(1) was intro- 
duced by Connes and Moscovici (1998) although not 
actually as a bicrossproduct (but motivated by the 
bicrossproduct theory) as part of a family H(n) useful 
in cyclic cohomology computations. It has cross 
relations and coproduct determined by 


[65x] = [OY] = fis, 
Aó01—60469014-16& 64 
Ax =x@1+1@x+6, ®y, 
Ay=y@1+1@y 


which we see has a semidirect product form where 
On Ax —64,,1,6,«1y — nó,. The coalgebra is also a 
semidirect coproduct by means of a back-reaction of 
Hœ in B, (expressed as a coaction). From the 
bicrossproduct theory, we also have a dual model 


C[B., )><U (diff) 


where diffo is the Lie algebra of the group of 
diffeomorphisms fixing the origin. As such it could be 
viewed as in the family of examples in the section 
“Bicrossproduct Poincaré quantum groups” but 
now with a 2D B,. We also conclude from 
the bicrossproduct theory that this acts covariantly on 
RŽ = U(b, ) after introducing the scaling parameter A. 

Finally, the Hopf algebra H(1) is also part of a 
family of bicrossproduct Hopf algebras built on rooted 
trees and related to bookkeeping of overlapping 
divergences in renormalizable quantum field theories 
(see Hopf Algebra Structure of Renormalizable Quan- 
tum Field Theory). While we have not had room to 
cover all bicrossproduct quantum groups of interest, it 
would appear that bicrossproducts are indeed inti- 
mately tied up with actual quantum physics. 


See also: Classical r-Matrices, Lie Bialgebras, and 
Poisson Lie Groups; Hopf Algebra Structure of 
Renormalizable Quantum Field Theory; Hopf Algebras 
and g-Deformation Quantum Groups; Quantum Group 
Differentials, Bundles and Gauge Theory; 
Riemann-Hilbert Problem; von Neumann Algebras: 
Introduction, Modular Theory, and Classification Theory. 
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Introduction 
Consider the following equation: 
F(X, n) = 0 [1] 


where X is the variable, y is a parameter, and X, jj, F 
belong to appropriate (finite- or infinite-dimensional) 
spaces. The problem of bifurcation theory is to 
describe the singularities of tbe set of solutions 


S, — (X; (X, ) satisfies F(X, 1) = 0) 


The word “bifurcation” was introduced by H 
Poincaré (1885) in his study of equilibria of rotating 
liquid masses. 

The simplest example is the study of the real roots 
x of a quadratic polynomial 


x*--bx--c-0 [2] 


where œ is represented by the pair of parameters 
(b,c) € R?. As it is well known, real roots are 
determined by the sign of 


AS p —4c 


For A « 0, there is no real solution of [2], while 
there are two solutions x+ in the region A > 0, 
which merge when the distance between the point 
(b,c) and the parabola A — 0 tends towards 0. It is 
then clear that a singularity occurs in tbe structure 
of the set of solutions of [2] at the crossing of the 
parabola A=0 or, in other words, a bifurcation 
occurs in the parameter space (b,c) on the parabola 
A=0. A point (40,xo) € R? is then called a 
bifurcation point if ji — (b,c) satisfies A —0, and 
Xo = —b/2. 

In the theory of differential equations, F(X, 1) 
often represents a vector field. This study is then 
concerned with the existence of equilibrium solu- 
tions to the differential equation 

X 

X = F(X, y) 3 
and is therefore referred to as static bifurcation 
theory. In addition, dynamic bifurcation theory is 
concerned here with “changes” in the dynamic 
properties of the solutions of the differential 
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equation as jp varies. A widely used way to 
characterize these “changes” is to say that the vector 
field F( - , po) is structurally stable if the sets of orbits 
of the differential equation are homeomorphic for ju 
close to uo, with homeomorphisms which preserve 
the orientation of the orbits in time ¢. Then a 
bifurcation occurs at pp=po if F(-,p9) is not 
structurally stable. It turns out that there is a close 
link between the stability properties of equilibrium 
solutions of the differential equation and the type of 
the bifurcation in static theory. 

The tools developed in bifurcation theory are 
extensively used to solve concrete problems arising 
in physics and natural sciences. These problems may 
be modeled by ordinary or partial differential 
equations, integral equations, but also delay equa- 
tions or iteration maps, and in all these cases the 
presence of parameters naturally leads to bifurcation 
phenomena. They can be regarded as problems of 
the form [1] or [3], in suitable function spaces, and 
bifurcation theory allows to detect solutions and to 
describe their qualitative properties. During the last 
decades, a class of problems in which the use of 
bifurcation theory led to significant progress is 
concerned with nonlinear waves in partial differen- 
tial equations, including hydrodynamic problems, 
nonlinear water waves, elasticity, but also pattern 
formation, front propagation, or spiral waves in 
reaction-diffusion type systems. 


Examples in One and Two Dimensions 


The most complete results in bifurcation theory are 
available in one and two dimensions. The study of 
static bifurcations in one dimension is concerned 
with scalar equations 


f(x, u) =0 [4] 


where x € R, u € R, and the function f is supposed to 
be regular enough with respect to (x,y). When 
f (xo, 110) = 0 and the derivative of f with respect to x 
satisfies Of (xo, 4o) Æ 0, the implicit function theorem 
gives a unique branch of solutions x(1) for jz close to 
Ho, and shows the absence of bifurcation points near 
(119, Xo). Bifurcation theory intervenes when 


Oxf (xo, uo) = 0 [5] 


and one cannot apply the implicit function theorem 
for solving with respect to x near xo. A complete 
description of the set of solutions near (xo, 49) can 
be obtained by looking at the partial derivatives of f 
with respect to x and pu. 
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For example, if 


Ə f (xo, uo) A 9, 


it is possible to solve with respect to jz and obtain a 
regular solution p(x) such that HA(xo) 王 /0 and 
f(x, u(x)) =O. In addition, if the second order 
derivative 


Of (xo, uo) #0 


the picture of the solution set in the plane (ju, x), also 
called bifurcation diagram, shows a turning point 
with a fold opened to the left or to the right 
depending upon the sign of the product 0,f (xo, po): 
af (xo, po); see Figure 1. Notice that here the 
bifurcation point (40,xo) € R? corresponds to the 
appearance of a pair of solutions of [4] "from 
nowhere". This is the simplest example of a one- 
sided bifurcation in which the bifurcating solutions 
exist for either y > pọ Or u < wo. 

A particularly interesting situation arises when the 
equation possesses a symmetry. For example, assume 
that in [4] the function f is odd with respect to x. This 
implies that we always have the solution x — 0, for any 
value of the parameter jz. Assume now that f satisfies 


osf (0, Ho) =0 [6] 
and that 


OL f(O, uo) #0, f(O, uo) #0 [7] 


Then the point (u9,0) is a pitchfork bifurcation 
point, this denomination being related with the 
bifurcation diagram in the plane (u, x); see Figure 2. 
Notice that here, the bifurcation point (po, xo) € R? 
corresponds to the bifurcation from the origin of a pair 
of solutions exchanged by the symmetry x ——»x, in 
addition to the persistent "trivial" solution x — 0 
which is invariant under the above symmetry. Such a 
bifurcation is also referred to as a symmetry-breaking 
bifurcation. Similar bifurcation diagrams are found 
when the equation [4] has a *known" branch of 


(4o; Xo) A 


Figure 1 Turning point bifurcation in the case 9, f(Xo, jio) > 0 
and f(xo, uo) < 0. The solid (dashed) line indicates the branch 
of stable (unstable) solutions in the differential equation. 


(no, 0) F 


Figure 2 Supercritical pitchfork bifurcation in the case 
& f(O, 10) > 0 and O$f(0,49) < 0.. The solid (dashed) lines 
indicate the branch of stable (unstable) solutions in the 
differential equation. 


solutions x(j4) for u close to po. This situation arises 
often in applications where usually this branch consists 
of trivial solutions x(jz)=0. Then at a bifurcation 
point (uo, xo) a second branch of solutions appears 
forming either a one-sided bifurcation, or a two-sided 
bifurcation; see Figure 3. 

We can now view f as a vector field in the 
ordinary differential equation 


Œ = f(x.) 8| 


and the study above corresponds to looking for 
equilibrium solutions of [8]. The stability of such a 
solution is determined by the sign of the derivative 
Of (x, u) of f at this equilibrium, and it is closely 
related to the type of the static bifurcation. 

In the case of a turning point bifurcation, when 
O2f (xo, uo) # 0, the sign of Of (x, 1) is different for 
the two bifurcating solutions. This means that one 
solution is attracting (i.e., stable), the other one 
being repelling (i.e., unstable); see Figure 1. In the 
case of a pitcbfork bifurcation as above, the stability 
of the trivial solution x =0 changes when yp crosses 
jo, and the stability of both bifurcating nonzero 
solutions is the opposite from the stability of the 
origin on the side of the bifurcation. The bifurcation 


(a) (b) (c) 

Figure 3 Typical bifurcation diagrams in the case of a branch 
of trivial solutions. One-sided bifurcations: (a) supercritical, 
(b) subcritical; two-sided bifurcation: (c) transcritical. The solid 
(dashed) lines indicate the branch of stable (unstable) solutions 
in the differential equation. 


is called supercritical if the bifurcating solutions lie 
on the side of the bifurcation point where the basic 
solution x =0 is unstable and subcritical otherwise; 
see Figure 2. The situation is the same in the case of 
one-sided bifurcations for an equation which has a 
*known" branch of solutions. In the case of a two- 
sided bifurcation, there is an exchange of stability at 
the bifurcation point (4/9, xo), solutions on the two 
branches having opposite stability for p> jo and 
|. < jio, which changes at (140, xo). Such a bifurcation 
is also referred to as transcritical; see Figure 3. 

Notice that the study of fixed points or periodic 
points for maps enter in the above frame. Specifi- 
cally, the period-doubling process occurring in 
successive bifurcations of one-dimensional maps is 
a common phenomenon in physics. 

The analysis of bifurcations in two dimensions 
leads to more complicated scenarios. Consider the 
differential equation [8] in which now x € R? and 
f(x,y) € R2, and assume that f(xo,10)=0. The 
behavior of solutions near (xo, uo) is determined by 
the differential Dyf (xo, uo): L of f with respect to 
x, which can be identified with a 2 x 2 matrix. For 
steady solutions, the implicit function theorem 
insures the existence of a unique branch of solutions 
x(u) provided L is invertible or, in other words, zero 
does not belong to the spectrum of L. Consequently, 
the study of bifurcations of steady solutions is 
concerned with the case when zero belongs to the 
spectrum of L, and can be performed following 
the strategy described for one dimension, provided 
that the zero eigenvalue of L is simple. For example, 
assuming that the second eigenvalue is negative 
leads in general to a saddle-node bifurcation, where 
an additional dimension is added to the previous 
picture of a turning point bifurcation, in which one 
of the two bifurcating steady solutions is a stable 
node, while the other one is a saddle. If, in addition, 
there is a symmetry $ commuting with f, that is, 
such that f (Sx, jz) = Sf (x, u), and if, for example, xo 
is invariant under S, Sx0 = xo, and the eigenvector (o 
associated to the zero eigenvalue of L is antisym- 
metric, LGo = —Co, then there is again. a pitchfork 
bifurcation. The equation possesses a branch of 
symmetric steady solutions the stability of which 
changes when crossing the value po of the para- 
meter, node on one side and saddle on the other, 
and a pair of solutions is created in a one-sided 
bifurcation which are exchanged by the symmetry S 
and have stability opposite to the one of the 
symmetric solution, just as in the one-dimensional 
pitchfork bifurcation above. 

A new type of bifurcation that arises for vector 
fields in two dimensions is the so-called Hopf 
bifurcation. This bifurcation was first understood 
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Figure 4 Supercritical Hopf bifurcation. 


* 


by Poincaré, and then proved in two dimensions by 
Andronov (1937) using a Poincaré map, and later in 
n dimensions by Hopf (1948) by means of a 
Liapunov-Schmidt-type method. For the differential 
equation, the absence of the zero eigenvalue in the 
spectrum of L is not enough to ensure that the 
vector field f(-,40) is structurally stable in a 
neighborhood of xo. This only holds when the 
spectrum of L does not contain purely imaginary 
eigenvalues, as asserted by the Hartman-Grobman 
theorem. We are then left with the case when L has 
a pair of purely imaginary eigenvalues tiw,w € R*. 
Static bifurcation theory gives that the system has a 
unique branch of equilibria (x(x), 1) for jz close to 
Ho, and typically their stability changes as ju crosses 
Ho. For the differential equation a Hopf bifurcation 
occurs in which a branch of periodic orbits 
bifurcates on one side of pọ, and their stability is 
opposite to that of the steady solution on this side; 
see Figure 4. A convenient way to study this 
bifurcation is through “normal form theory," 
which is briefly described below. 


Local Bifurcation Theory 


There are two aspects of bifurcation theory, local 
and global theory. As this designation suggests, local 
theory is concerned with (local) properties of the set 
of solutions in a neighborhood of a “known” 
solution, while global theory investigates solutions 
in the entire space. 

An important class of tools in local bifurcation 
theory consists of reduction metbods, among which 
the Liapunov-Scbmidt reduction and the center 
manifold reduction are often used to investigate 
static and dynamic bifurcations, respectively. The 
basic idea is to replace the bifurcation problem by 
an equivalent problem in lower dimensions, for 
example, a one- or a two-dimensional problem as 
the ones above. 

Consider again the equation [1] in which F:X x 
M — Y is sufficiently regular, and X, y, and M are 
Banach spaces. Assume, without loss of generality, 
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that F(0, 0) — 0, or, in other words, that one solution 
is known. The equation can be then written as 


LX + G(X,p) =0 


in which L = DxF(0,0) represents the differential of 
F with respect to X at (0, 0), and is assumed to have 
a closed range. The implicit function theorem shows 
absence of bifurcation if L has a bounded inverse, so 
that bifurcations are related to the existence of a 
nontrivial kernel of L. The Liapunov—Schmidt 
reduction then goes as follows. 

Let N(L) and R(L) denote the kernel and the range of 
L, respectively, and consider continuous projections 
P:¥—N(L) and O:Y—R(L). Then there exists a 
bounded linear operator B : R(L) — (id — P)X, the right 
inverse of L, satisfying LB — id on R(L) and BL — id — P 
on X. For X € X one may write 


X = Xo+ Xı, Xo = PX, X, = (id - P)X 
and then by projecting with id — O and O the 
equation becomes 


(id — O)G(Xo + X1, u) = 0 
X1 + BOG(Xo + X1, 1) = 0 


The implicit function theorem allows to solve the 
second equation for X, = v(Xo, i4) in a neighborhood 
of the origin. Substitution into the first equation leads 
to the equation in (id — OQ) for Xo in PX, 


(id — Q)G(Xo + v(Xo, u), u) = 0 


also called bifurcation equation. This equation 
completely describes the set of solutions to [1] in a 
neighborhood of (0, 0), and this problem is then 
posed in a space of dimension much smaller than the 
dimension of X. 

The basic principle of the Liapunov-Schmidt method 
has been discovered and used independently by different 
authors. E Schmidt (1908) used this method for integral 
equations, while Liapunov used it to study the stability 
of the zero solution of nonlinear partial differential 
equations when the linear part has zero eigenvalues 
(1947), and later in 1960 for the bifurcation problem 
studied by Poincaré (1885). In working in a Banach 
space of t-periodic functions, the Liapunov-Schmidt 
method may be used to solve the Hopf bifurcation 
problem, as did Hopf himself in 1948. 

The analog of this reduction procedure for the 
differential equation [3] is the center manifold 
reduction. Assuming that F(0,0) —0, we obtain the 
differential equation 


dX 


Since dynamic bifurcations are related to the existence 
of purely imaginary spectral values of L, the kernel of L 
alone is not enough to describe this situation. One has to 
consider the spectral space Y, of L associated to the 
purely imaginary spectrum of L. A spectral gap is 
needed between this part of the spectrum and the rest 
(always true in finite dimensions), so that the spectral 
projection P onto , is well defined. One writes 


X=X.+X,, X,-—PX, X, = (id - P)X 
and obtains the decomposed system 
Xe 
= = LX; + PG(X, + Xp, Lu) 
dX; 
a =X, +.(id.— P)G(X, + X,, ji) 


The reduction procedure works provided the non- 
homogeneous linear equation 
e =LX, + f(t) 

possesses a unique solution in suitably chosen 
function spaces with weak exponential growth, 
such that one can then solve the second equation 
for Xj =(X,) in a neighborhood of the origin in 
these function spaces. This property is always true in 
finite dimensions, but it has to be checked in infinite 
dimensions. Different results showing the solvability 
of this equation are available in both Banach and 
Hilbert spaces, relying upon additional conditions 
on the spectrum of L, decaying properties of the 
resolvent of L on the imaginary axis, and regularity 
properties of the nonlinearity G. The map V is then 
used to construct a map v: PX x M (id — P)X, 
defined in a neighborhood of the origin, which 
parametrizes a local center manifold invariant under 
the flow of the equation. The flow on this center 
manifold is governed by the reduced equation in Ya, 


OM LX, + PG(Xe + W(X) p) 
which completely describes the bifurcation problem. 
The first proofs of this result were given in finite 
dimensions by Pliss (1964) and Kelley (1967). Center 
manifolds in infinite dimensions have been studied in 
different settings determined by assumptions on the 
linear part L and the nonlinear part G. One typical 
assumption in infinite dimensions is that the spectrum 
of L contains only a finite number of purely imaginary 
eigenvalues, so that the reduced equation above is a 
differential equation in a finite-dimensional space. 
These reduction methods work for a large class of 
problems and the advantage of such an approach is 
that one is left with a bifurcation problem in a 
lower-dimensional space. The methods involved in 


solving this reduced bifurcation problem can be very 
different from one problem to another, and often 
make use of some additional structure in the problem, 
such as a gradient-like structure, Hamiltonian 
structure, or the presence of symmetries, which 
are preserved by the reduction procedure. 

A powerful tool for the analysis of these reduced 
differential equations is provided by the normal 
form theory, which goes back to works of Poincaré 
(1885) and Birkhoff (1927). The idea is to use 
coordinate transformations to make the expression 
of the vector field as simple as possible. The 
transformed vector field is called normal form. 
There is an extensive literature on normal forms 
for vector fields in many different contexts, in both 
finite- and infinite-dimensional cases. Typically the 
classes of normal forms are characterized in terms of 
the linear part of the differential equation. 

For differential equations of the form 


dx 

dt = Lx + g(x, u) [2] 
in which L is a matrix and g a sufficiently regular 
map such that g(0,0) — 0, D.g(0, 0) —2 0, as encoun- 
tered in bifurcation theory, one possible character- 
ization of normal forms makes use of the adjoint 
matrix L'. Fixing any order k> 2, there exist 
polynomials ® and N of degree k in x with 
coefficients which are regular functions of p, 
and (0,0) = N(0,0) = 0, D,9(0,0) = D,.N(0, 0) — 0, 
such that by the change of variables 


x = y+ Oy, u) 
the equation [9] is transformed into the normal form 
d 
3; = Ly + N(y, u) + o(llyl]*) [10 


in which the polynomial N is characterized through 
N(e™ y, u) =e N(y, u) 
for all y, 4, and t, or, equivalently, 
DyN(y, u)L*y = L'N(y, p) 


for all y and jj. This characterization allows to determine 
the classes of possible normal forms for a given matrix L, 
and also provides an efficient way to compute the 
normal form for a given vector field g. As for the 
reduction methods, normal form transformations can be 
made to preserve the additional structure of the 
problem, such as Hamiltonian structure or symmetries. 

As an example, consider a differential equation of 
the form [9] with x € R” and u € R, which supports a 
Hopf bifurcation so that L has simple eigenvalues 
+iw,w > 0, and no other eigenvalues with zero real 
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part. The center manifold reduction provides a 
two-dimensional reduced system with linear part 
having the simple eigenvalues +iw, for which it is 
convenient to write the normal form in complex 
variables 


dA. 2 2k+2 

Gp iA AQ(IA[. n) +o(|A| 7) 

for A(t) € C, where O is a complex polynomial of 
degree k in |A|? with O(0, 0) — 0, or, equivalently, in 
polar coordinates A = re'?, 


dr 


c= rO,(r2, u) + o(r24+?) 
w+ Qu(r^. n) + o(r?**1) 


O, and Q, being the real and imaginary part of O, 
respectively. The radial equation for r truncated at 
order 2k + 1 decouples and admits a pitchfork bifurca- 
tion. The bifurcating steady solutions of this equation 
then lead first to periodic solutions for the truncated 
system, which are then shown to persist for the full 
equation by a standard perturbation analysis. 

A situation that occurs in a large class of problems 
is when the problem possesses a_ reversibility 
symmetry, which often comes from some reflection 
invariance in the physical space, that is, when the 
vector field F(:,1) anticommutes with a symmetry 
operator $. One of the simplest examples is the case 
of a differential equation [9] when the matrix L has 
a double eigenvalue in 0, no other eigenvalues with 
zero real part, and a one-dimensional kernel which 
is invariant by S. In this case, the center manifold 
reduction provides a two-dimensional reduced rever- 
sible system, which can be put in the normal form 


da 
5 M b 
db — 2 3 
=u- 4 + o((la| + |6))°) 
which anticommutes with the symmetry 


(a,b) — (a, —b). The above system undergoes a 
reversible Takens-Bogdanov bifurcation and has 
for u > 0 a phase portrait as in Figure 5. There are 
two equilibria, one a saddle, the other a center, and 
a family of periodic orbits with the zero-amplitude 
limit at the center equilibrium, and the infinite- 
period limit a homoclinic orbit, originating at the 
saddle point. In concrete problems the bounded 
orbits of such a reduced system determine the shape 
of physically interesting solutions of the full system 
of equations, such as, for example, in water-wave 
theory where to homoclinic and periodic orbits 
correspond solitary and periodic waves, respectively. 
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Figure 5 Phase portrait of the reduced system in a reversible 
Takens-Bogdanov bifurcation (left) and sketch of the a-component 
of solutions corresponding to homoclinic and periodic orbits (right). 
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Figure 6 Phase portrait of the reduced system in absence of 
reversibility (left) and sketch of the a-component of the solution 
corresponding to the bounded orbit (right). 


Notice that in the absence of the reversibility 
symmetry, the same type of bifurcation may lead to 
a completely different phase portrait for the reduced 
system as, for example, the one in Figure 6 in which 
the homoclinic and the periodic orbits disappear. 
This situation often occurs in the presence of a small 
dissipation in nearly reversible systems. 


Global Bifurcation Theory 


Most of the existing results in global bifurcation 
theory concern the static problem [1]. The analysis 
of global sets of solutions often relies upon 
topological methods, degree theory, but also varia- 
tional methods, or analytic function theory. Signifi- 
cant progress in understanding global branches of 
solutions has been made in the 1970s, in particular, 
for nonlinear eigenvalue problems and the Hopf 
bifurcation problem (see, e.g., works by Rabinowitz, 
Crandall, Dancer, and Alexander, Yorke, Ize, 
respectively). 

A now-classical result in the topological theory of 
global bifurcations is the following theorem by 
Rabinowitz (1970), which gives a characterization 
of global sets of solutions for eigenvalue problems of 
the form 


X = F(X, w) = wLX + H(X, p) 


H(X, u) = o(||X||), posed for (X, u) € X x R, X being 
a Banach space. In contrast to local theory where 
the function F is usually k-times differentiable (with 
a suitable k), in the global theory a typical 
assumption is that F: x R — X is compact. The 
equation above possesses a "trivial" branch of 


solutions (0,5) for any yw. The bifurcation result 
asserts that if for some real parameter value uo zero 
is an eigenvalue of odd multiplicity of the operator 
id 一 uo L, then the set S of nontrivial solutions (X, p) 
possesses a maximal subcontinuum which contains 
(0, 9) and meets either infinity in X x R or another 
trivial solution (0, 11), 41 Æ po. In particular, (10, 0) 
is a bifurcation point. A local version of this result is 
often referred to as Krasnoselski's theorem. 

Different versions and extensions of these theo- 
rems can be found in the literature, as, for example, 
in the case of a simple eigenvalue, or if the field F is 
real-analytic. when the set of solutions is path- 
connected. More recent works address the question 
of lack of compactness, and a number of results are 
now available for problems with additional struc- 
ture (gradient-like or Hamiltonian structure), but 
also for concrete problems, such as the water-wave 
problem. 
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introduction 


Almost all classical hydrodynamical stability problems 
are experiments or gedankenexperiment which have 
been designed to understand and to extract special 
phenomena in more complicated situations. Examples 
are the Taylor-Couette problem, Bénard’s problem, 
Poiseuille flow, or Kolmogorov flow. 

The Taylor-Couette problem consists in finding the 
flow of a viscous incompressible fluid contained in 
between two coaxial co- or counterrotating cylinders, 
cf. Figure 1. If the rotational velocity of the inner 
cylinder is below a certain threshold, the trivial 
solution, called the Couette flow, is asymptotically 
stable. At the threshold, this spatially homogenous 
solution becomes unstable and bifurcates via a pitch- 
fork bifurcation or a Hopf bifurcation into different 
spatially periodic patterns, that is, depending on the 
rotational velocity of the outer cylinder the basic 
patterns are stationary (called the Taylor vortices) or 


Figure 1 The Taylor-Couette problem with the Taylor vortices. 
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time-periodic. If the rotational velocity of the inner 
cylinder is increased further, more complicated pat- 
terns occur. The bifurcation scenario is well under- 
stood from experiments and analytic investigations. 

Bénard's problem consists in finding the flow of a 
viscous incompressible fluid contained in between two 
plates, where the lower plate is heated and the upper 
plate is kept at a constant temperature, cf. Figure 2. If 
the temperature difference between the two plates is 
below a certain threshold, the transport of energy from 
below to above is made by pure conduction. At this 
threshold, this spatially homogenous solution becomes 
unstable, convection sets in, and spatially periodic 
patterns as rolls or hexagons occur. Convection 
problems play a big role in geophysical applications, 
that is, in spherical domains, as the earth. The paradigm 
for an anisotropic pattern-forming system is electro- 
convection in nematic crystals. 

Poiseuille flow consists in finding the flow of a 
viscous incompressible fluid flowing through a pipe 
driven by some pressure gradient, cf. Figure 3. In 
noncircular pipes, the trivial laminar flow becomes 
unstable at a critical pressure gradient. Experimen- 
tally, a direct transition to turbulent flow with large 
amplitudes is observed, according to the fact that in 
general at the instability point of the trivial solution 
a subcritical bifurcation occurs. 


人 AAA 


Figure 2 Bénard's problem with rolls. 


Figure 3 Poiseuille flow with the trivial solution. 
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Figure 4 The inclined-plane problem. The trivial Nusselt 
solution possesses a flat top surface and a parabolic flow profile. 


Kolmogorov flow consists in finding the flow of a 
viscous incompressible fluid under the action of an 
external force parallel to the flow direction x and 
varying periodically in the perpendicular y-direction. 
This gedankenexperiment has been designed by 
Kolmogorov in 1958 as a simplified model for the 
Poiseuille flow problem in order to study the nature 
of turbulence. The trivial solution which is called 
Kolmogorov flow can become unstable via a long- 
wave instability along the flow direction. 

The inclined-plane problem consists in finding the 
flow of a viscous liquid running down an inclined 
plane, cf. Figure 4. The trivial solution, the so-called 
Nusselt solution, becomes sideband-unstable if the 
inclination angle ¢ is increased. Then the dynamics is 
dominated by traveling pulse trains, although the 
individual pulses are unstable due to the long-wave 
instability of the flat surface. Time series taken from 
the motion of the individual pulses indicates the 
occurrence of chaos directly at the onset of instability. 

There are other famous hydrodynamical stability 
problems, with arbitrarily complicated bifurcation 
scenarios. 


Spectral Analysis of the Trivial Solution 


All classical hydrodynamical stability problems are 
described by the Navier-Stokes equations 


1 
3U =—AU - Vp - 
0-v.U 


where U= U(x,t) € R^ with d —2,3 is the velocity 
field, p = p(x, t) € R the pressure field, f some external 
forcing, and v the dynamic viscosity. These equations 
are completed with boundary conditions. In case of 
Bénard's problem, the Navier-Stokes equations are 
coupled to a nonlinear heat equation. 

By projecting U onto the space of divergence-free 
vector fields and by taking the trivial solution as 
new origin all problems from the previous section 
can be written as evolutionary system 


(U-V)U+f 1] 


3U = AU + N(U) 


where U — 0 corresponds to the trivial solution, where 
A isa linear and N(U) = O(U?) for U — 0 a nonlinear 
operator. Most of the examples from the previous 
section are semilinear, that is, from a functional 
analytic point of view, the nonlinear operator N can 
be controlled in terms of the linear operator A. 

Since the form of the bifurcating pattern is only 
slightly influenced by far away boundaries, that is, for 
instance, the upper and lower end of the rotating 
cylinders in the Taylor-Couette problem, the problems 
are considered from a theoretical point of view in 
unbounded domains, Q — R7 x X, with X C R” the 
bounded cross section that is, for instance, that the 
Taylor-Couette problem is considered with two cylin- 
ders of infinite length. Then the eigenfunctions of the 
linear operator A are given by Fourier modes, that is, 


Alepp n(z)) = A«(R)e** Pen (z) 


with x € RZ,k e R?,k iiem RA k;x;,z € Un EN. 
If an external control parameter is changed, inde- 
pendent of the underlying physical problem, the 
trivial solution becomes unstable, then the surface 
kı ReA((k) intersects the plane {ReA;(k) — 0]. 
Generically, this happens first at a nonzero wave 
vector Re Æ 0 (cf. Figure 5). 

Examples for such an instability are the Taylor- 
Couette problem, Bénard's problem, or Poiseuille 
flow. Very often, due to some conserved quantity in 
the problem we have ReA,(0)=0 for all values of 
the bifurcation parameter. Then, a so-called side- 
band instability can occur, cf. Figure 6. 

Examples for such an instability are the Kolmo- 
gorov flow problem or the inclined plane problem. 

According to some symmetries in the problem, for 
instance, reflection along the cylinders in the 
Taylor-Couette problem or rotational symmetry in 
Bénard's problem, the curves in Figure 5 are double 
or rotational symmetric. 

In case of 2 being spherical symmetric, we have 


A(fi(r)ui n(Z)) = Afir) pr s (x) 


/N | ZN* 


Figure 5 Real part of the spectrum in case of an instability at a 
wave number k, # 0. Definition of the small bifurcation parameter e. 
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Figure 6 Real part of the spectrum in case of a sideband 
instability. Definition of the small bifurcation parameter e. 


with r20,z€$45,5,, for I€ No and m=—), 
1—1,...,14- 1,1 being a spherical harmonic, that 
is, if Aj, is the eigenvalue having first positive real 
part, then by symmetry, simultaneously 2/9 + 1 
eigenvalues cross the imaginary axis. 


Reduction of the Dimension 


In order to understand the occurrence of the spatially 
periodic Taylor vortices in the Taylor—Couette pro- 
blem and of the roll solutions and hexagons in 
Bénard’s problem, the problems are considered with 
periodic boundary conditions along the unbounded 
directions. Then the instability of the trivial solution 
occurs when at least one eigenvalue crosses the 
imaginary axis. Generically, this happens by a simple 
real eigenvalue or a pair of complex-conjugate 
eigenvalues crossing the imaginary axis (Figure 7). 
Center manifold theory and the Lyapunov-Schmidt 
reduction allow to reduce the a priori infinite-dimen- 
sional bifurcation problem to a finite-dimensional one. 

In case of a real eigenvalue A4 crossing the imaginary 
axis, the solution 4 can be written as a sum of the 
weakly unstable mode and the stable modes, that is, 
u— Cup + Uy, (c1 € R), where z, lives in the closure of 
the span of the stable eigenfunctions {y2, 3, ...]. For 
the linearized system all solutions are attracted by the 
one-dimensional set E,={u|u,=0}, in which all 
solutions diverge to infinity. 

For the nonlinear system and small bifurcation 
parameter this attracting structure survives, no 
longer as a linear space, but as a manifold 
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Figure 7 Generically, a simple real eigenvalue or a pair of 
complex-conjugate eigenvalues cross the imaginary axis. 


Figure 8 The center manifold is invariant under the flow, is 
tangential to the central subspace Es, and attracts nearby 
solutions with some exponential rate. 


B, = 4 = inia e Car] 
b(c1) E span{ 2, 3;-- JH 


the so-called center manifold which is tangential to Ee, 
that is, ||/(c1)|| € Cllci|l^ (Figure 8). The dynamics on 
M, is no longer trivial due to the nonlinear terms. 

Due to the fact that real problems are considered 
ReA,(k,) =0 implies ReA;(—k,) — 0, that is, in case 
of 2zx/k,-periodic boundary conditions always two 
eigenvalues cross the imaginary axis simultaneously. 
For Bénards's problem in a strip or for the Taylor- 
Couette problem in case of a bifurcation of fixed 
points, the reduced system on the center manifold is 
derived with the ansatz 


U = eA(e?t)e^-* + c.c. + O(e?) 


where 0 < £ < 1 is the small bifurcation parameter, 
cf. Figure 5. Then due to ekeke itx — ekx the 
complex-valued amplitude A satisfies the so-called 
Landau equation 


OrA = A — 3A|AF + O(e) 


where the Landau coefficient y € R is obtained by 
classical perturbation analysis (Figure 9). The 
reduced system is symmetric under the S'-symmetry 


Figure 9 The dynamics of the Landau equation. Except of the 
origin which corresponds to the Couette flow, all solutions 
converge towards the circle of fixed points, which corresponds 
to the family of Taylor vortices. The translation invariance of the 
Taylor-Couette problem is reflected by the rotational symmetry of 
the reduced system. 
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Am Ae with 四 ER which corresponds to the 
translation invariance of the original systems. 

This so-called equivariant bifurcation theory has 
been applied successfully to convection problems in 
the plane and on the sphere. 

The stability of time-periodic flows can be 
analyzed with Floquet multipliers. Bifurcations 
from a time-periodic solution can lead to quasiper- 
iodic motion in time. Ruelle and Takens (1971) 
showed that already the next bifurcation leads to 
chaotic dynamics. Since this time many classical 
hydrodynamical stability problems have been ana- 
lyzed with bifurcation theory up to turbulent flows. 

It was observed that center manifold theory can 
also be applied successfully to elliptic PDE problems 
posed in spatially unbounded cylindrical domains. 
A famous example is the construction of capillary- 
gravity solitary waves for the so-called water-wave 
problem. 


Modulation Equations 


The analysis of the last section is of no use in case of 
a sideband instability occurring at the wave number 
k.=0, as it happens in the inclined-plane problem 
or in the Kolmogorov flow problem. Moreover, in 
case of an instability at a wave vector k; # 0, based 
on the above analysis, front solutions cannot be 
described. In such situations, the method of modula- 
tion equations generalizes the role of the finite- 
dimensional amplitude equations from the last 
section. 

The complex cubic Ginzburg—Landau equation in 
normal form is given by 


ÓrA = (1+ ia) A +A — (1 + iB)A|Al 


where the coefficients a, 9 € R are real, and we have 
Xc€R,T20, and A(X,T)&€C. The Ginzburg- 
Landau equation is a universal amplitude equation 
that describes slowly varying modulations, in space 
and time, of the amplitude of bifurcating spatially 
periodic solutions in pattern-forming systems close 
to the threshold of the first instability. Whenever the 
instability drawn in Figure 5 occurs, that is, for the 
Taylor-Couette problem and Bénard's problem in a 
strip, that is, d — 1, it can be derived by a multiple 
scaling ansatz 


u(x,t) ~ eA(c(x — cgt), g^ t)elkex-unt) ec, 


For instance, in case of a. — 8—0, the Ginzburg- 
Landau equation possesses front solutions connect- 
ing the stable fixed point A=1 with the unstable 
fixed point A — 0. Such solutions correspond in the 
Taylor-Couette problem to modulating fronts 


Figure 10 The front solution of the Ginzburg-Landau equation 
modulates the underlying pattern in the original system. 


connecting the stable Taylor vortices with the 
unstable Couette flow, cf. Figure 10. 

The diffusion operator in the Ginzburg-Landau 
equation reflects the parabolic shape of ReA, close 
to k=k, in. Figure 5. In case of the long-wave 
instability, as drawn in Figure 6, the second-order 
differential operator changes in a fourth-order 
differential operator. 

For Kolmogorov flow with T= et and X — ex and 
the amplitude scaled with £, we obtain that in lowest 
order A has to satisfy a Cahn-Hlilliard equation 


ðrA = —V202. A — 30$ A + 402 (A?) 


where A(X, T) € R and y € R a constant (cf. Figure 6). 
The Kuramoto-Shivashinsky (KS)-perturbed KdV 


equation 
OrA = —0ju — 0x (A?)/2 — e(02 + 04)u 


with A-A(X, T) e R,X € R,T >0, where0ce«1 
is still a small parameter, can be derived for the 
inclined problem with T=e*t and X —ex and the 
amplitude scaled with £>. 

The theory of modulation equations is nowadays a 
well-established mathematical tool which allows us to 
construct special solutions, global existence results for 
the solutions of pattern-forming systems, or allows to 
characterize the attractors in such systems. The 
method is based on approximation results, showing 
that solutions of the original systems can be approxi- 
mated by the modulation equation and attractivity 
results showing that every solution of the original 
system develops in such a way that it can be described 
by the modulation equation. 

This method can also be applied to secondary 
bifurcations describing instabilities of spatially per- 
iodic wave trains. Then the so-called phase-diffusion 
equations, conservation laws, Burgers equations, 
and again the KS equations occur. 

However, this method cannot be applied success- 
fully in all situations. There are counterexamples 
showing that not every formally derived modulation 
equation describes the original system in a correct 
way. Moreover, very often according to some 
symmetries in the original problem no consistent 


Continuous spectrum 


um. 
TB as 


Discrete|eigenvalues 


Figure 11 Spectrum for the flow around an obstacle. 


multiple scaling analysis is possible, that is, that the 
modulation equations still depend on e. 


Discussion 


There is no satisfactory bifurcation analysis for situa- 
tions where boundary layers play a role. The most 
simple problem is the flow around some obstacle. The 
difficulties are according to the fact that due to the 
unbounded flow region there is always continuous 
spectrum up to the imaginary axis. From the localized 
obstacle discrete eigenvalues are created, (cf. Figure 11). 

In such a situation, so far there is no mathematical 
bifurcation theory available. 


See also: Bifurcation Theory; Dynamical Systems in 
Mathematical Physics: An Illustration from Water Waves; 
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Introduction 


Bifurcation theory of periodic orbits relates to 
modeling of quite diverse subjects. It appeared 
classically in the field of celestial mechanics with 
the contributions of H Poincaré. Van der Pol (1926, 
1927, 1928, 1931) observed the frequency-locking 
phenomenon in electrical circuits. More recently, 
Malkin's theory (Malkin 1952, 1956, Roseau 1966) 
was used to justify synchronization of weakly 
coupled oscillators modeling the electrical activity 
of the cells of the sinusal node in the heart. This 
article provides the essential mathematical back- 
ground necessary for existence of frequency locking. 
Applications can be found, for instance, in Weakly 
Coupled Oscillators. 
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Leray-Schauder Theory and Mapping Degree; Multiscale 
Approaches; Newtonian Fluids and Thermohydraulics; 

Symmetry and Symmetry Breaking in Dynamical Systems; 
Turbulence Theories; Variational Methods in Turbulence. 
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The Asymptotic Phase of a Stable 
Periodic Orbit 


Let l' be a periodic orbit of a vector field and let 
S(T) denote the stable manifold of T (resp. UI(T) 
denotes the unstable manifold of I). The following 
theorem can be found, for instance, in Hartman 
(1964). 


Theorem There exist a and K such that Re(A;) <—a, 
j=1,...,k and Re(X) > a, j=k+1,..., and for all 
x € S(T), there is an asymptotic phase to such that for 
allt > 0 


| be(x) — y(t — to) |< Ke ?V/? 
Similarly, for any x € U(TV), there is a ty such that t < 0, 
| x(x) — y(t — to) |< Kem 


If the periodic orbit is stable, the local stable 
manifold coincides with an open neighborhood of T. 
In such a case, there is a foliation of this open set 
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whose leaves are the points with a given asympto- 
tic phase. The asymptotic phase can be considered 
as a coordinate function ó defined on the 
neighborhood S(T). 

If we consider now the particular case of a plane 
system, this function can be completed with the 
square of the distance function to the orbit into a 
coordinate system called the *amplitude-phase" 
system and denoted as (p, à). 


Frequency Locking and Phase Locking 


The term “oscillator” has two meanings. A con- 
servative “oscillator” is a plane vector field which 
displays an open set of periodic orbits. It is said to 
be isochronous if all orbits have same period. A 
dissipative “oscillator” is a planar vector field which 
displays an attractive limit cycle (attractive periodic 
orbit). 
We consider N dissipative oscillators: 


ide = f (xi, yi) 
- ] 
T = g(xi, yi) 


where i= 1,...,7n. 

The dynamical system obtained by considering the 
space of all the variables (xi,y;), i = 1,...,m, dis- 
plays an invariant torus full of periodic orbits that 
we denote by T "(0). 

Assume now that the N oscillators are weakly 
coupled: 


dx; 
un f (xi, yi) + €F;(x, y, €) 


dy; 
dr BHI) + eGilx,y, © 
where e can be considered as small as we wish. 


Definition The system [2] has a frequency locking 
if it displays a family of stable periodic orbits T, for 
all values of € small enough which tends to (in the 
sense of Hausdorff's topology) a periodic orbit of " 
contained in the periodic torus T"'(0). 


Assume now that [2] has a frequency locking 
associated with the periodic orbit I(t). Consider the 
projections Ti(t) of T(t) on the coordinates plane 
(xi; yi), 1— 1,...,71. Assume that € is small enough 
so that the projection belongs to the open set $; on 
which are defined the *amplitude-phase" coordi- 
nates of the system [1]. We can write the system [2], 
restricted to the open set $= II" ,S;, as 


dp; 

d; fios) 

; [3] 
Oi — * = 

Gp = Pile m4); $52 1... 


Definition The system [2] has a phase locking if 
the system induced by [3] on I(t) 


da; 
dt 


has an attractive singular point. 


一 ®;(0, O, €) [4] 


As the attractive singular points are structurally 
stable, this is enough to assume that the system 


do; | | 
F o;(0, o, 0) [5] 


displays an attractive singular point. 


Periodic Orbits of Linear Systems 
Consider the linear system 


= = P(t)-x+ q(t) [6] 


where P is a continuous T-periodic matrix function 
and q is a vector T-periodic continuous function, 
x—(xi,...,x4). Consider also the two associated 
homogeneous equations: 


= = P(t) x [7a] 
dx ‘ 
a —P*(t)-x [7b] 


where P* denotes the transposed of P. 

The set of T-periodic solutions of [7b] is a vector 
space. m denotes its dimension. Let U’ (t), 7=1,...,m, 
be a basis of this vector space. This basis is completed 
by adding n — m solutions U/(t), j —m + 1,...,n, to 
obtain a basis of R". Let U(t) be the matrix whose 
columns are these vectors; denote U;;(t) the elements of 
this matrix. 

With the change of variable x = U*(0) !y, system 
[6] gets transformed into 


a (t)y + r(t) [8] 


with O(t) = U*( eura and r(t) = U*(0)q(t). 
Matrix V(t)= U~! (0)U(t) is such that 


dV Lo*(nv-0, V(0)=I 


and the k first column vectors V(t), denoted as 
V/(t), j — 1,...,m, are T-periodic. 
Let X(t) be the fundamental solution defined by 
dX 


= -90-X, X(0)=1 


then, 
X^) = v*() 


The solution of [8] can be written as 


y(t) = X(t) - y(0) + X(t) / X (w)rw)du [9] 


This yields that T-periodic solutions of [8] have 
initial data y(0) given by 


F 
(v"(T) -1)-y(0) = f V*(s)r(s)ds [10] 


0 


Conversely, given a solution y(0) of [10], 
T-periodicity of P and g and uniqueness of solutions 
of a differential equation imply that y(0) represents the 
initial data of a T-periodic solution of [8]. Hence, the 
T-periodic solutions of [8] are in one-to-one corre- 
spondence with the affine space defined by the 
solutions of [10]. The m first rows of V*(T) — I are 
zero and its rank is exactly n — m. In the following, 
assume that the determinant A formed by the (n — m) 
last rows and last columns of (V*(T) — I) is not zero. 

A necessary and sufficient condition so that [8] 
displays a T-periodic solution is 


TOn 
Í a Va(u)r(u)du —0, k=1,..., m |llal 

0 j=1 

> (Vie(T) — 64)y;(0) 
j=m-+1 
n T 
= Va(s)(s)ds, m+1<s<n [11b] 
0 


j=} 


This yields the Fredholm alternative, if the m 
conditions, 


n E 
> | Ua(s)dí(s)ds— 0, R=1,...,m [12] 
j=l 


are satisfied, then [6] displays a family x,(t) of 
T-periodic solutions depending of m parameters 
(015... Qm): 


Valt) = 0161 (t) pere ipe Om Ow (t) T x(t) [13] 


where X(t) is a particular T-periodic solution and 
ó;(t) denote T-periodic independent solutions of 
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[7a]. To be more specific, one can choose x(t) to 
be the unique solution of [6] such that 
y(0), =O,k=m+1,...,”, and @(t) solutions of 
[7a], such that y(0),=6,. With these notations, 


x,(t) is such that 
y(0) = ps k= ;sm 


and its other initial conditions y(0), = Bp, k =m + 
1,...,”, are fixed: 


Bk = 8; 


Malkin's Theorem for Quasilinear 
Systems 


Consider now nonlinear systems with the 
perturbation: 

dx 

"hm P(t) - x + q(t) + ef (x.t, c) [14] 


where f is C! and T-periodic in t. 

Assume that the solutions y(t, y(0), €) of [14] exist 
for all values of t, 0 € t € T. The solutions define a 
differential function of their initial data y(0). This is, 
for instance, true for perturbations of linear systems 
if c is small enough. 

Assume that q satisfies la condition [12] and that 
there is a solution 


to the equations 


n ; à 
ve(a) = | Ufa), 0) du = 0, 
ji 


k m1... [15a] 
so that 
Ov (a) 
05 — "E å —. "E 1 
Bo, ed Bel mp-heamo [e 


is invertible. 

Proceed as in previous section with the coordinate 
change x— U*(0) y. Equation [14] gets trans- 
formed into 


— = O(t)y + r(t) + €F(y, t, €) [16] 


with F= U*(0)f(U*(0)^! - y, t, c). 

Solutions of [16] are uniquely determined by their 
initial data. We can understand the parameters (o, 59) 
as coordinates on the space of solutions. With this 
viewpoint, for instance, the set of T-periodic 
solutions of [6] is an affine space of dimension m 
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given by the equations B= 3° and is parametrized by 
the coordinates a. In this space, we pick up a point 
(which corresponds to a particular T-periodic solu- 
tion of [6]): (a = a?). T-periodic solutions of [16] are 
in one-to-one correspondence with the solutions of 


n T 


Cy (a, B,€) = V. Va.(s)F;(y(s, €, a, 8), s, €)ds = 0, 


[17a] 


Rp lus un [17b] 


where a,,k=1,...,m and B=Yy(0), k=m + 
1,...,” parametrize the solutions y(t,¢,a,3) of 
[14] in this way: 


y(0)— U*(0)-x(0), x(0)— > ,a0;(0) +x(0) [18] 
j=] 


Consider the determinant of the Jacobian matrix 
of the mapping 


(a, B) — Cla, B, €) [19] 
for a —o9, B, = f/f, k=m +1,...,n, €=0. This is 
equal to the product of A and the determinant of 

Ovx (a) 


Oa; = 


0 [20] 


which is nonzero. 

The implicit-function theorem shows that the 
differential equation [14] (and thus [16] as well) 
has, for e small enough, a unique T-periodic solution 
which tends to x,» when e tends to 0. 


Generalization of Malkin's Theorem 


Finally, we consider the most general situation of 
the perturbation of a general system (not necessarily 
linear): 


= = f (x,t) + eg(x, t,€) [21] 


where we assume that 


vof (x,t) i22] 


displays an m-parameter family x,(t) of T-periodic 
orbits. 

Assume that the solutions y(t, y(0), €) exist for all 
0 < t € T and define a differentiable mapping of the 
initial data y(0). This is, for instance, the case if we 
assume that the nonperturbed equation defines a 
flow and if € is small enough. 

Assume also that the different solutions x,(t) are 
independent in the sense that the mapping 


QP Xalt) 


is an immersion for any f£. In other words, the m 
vectors dx, (t)/do; are independent. 

We linearize the solution along the family of 
periodic orbits: 


x — xa(t) + «€ [23] 
Equation [21] gets transformed into 


SE Dfi(xa(t).t)-€+e(xalt).t.0)+eF(Et.6) PA 


Set, furthermore, 
P(t) = Dfx(Xa(t),t), r(t) = g(xa(t), t, 0) 


and denote U(t) the fundamental solution of [7b] 
described earlier. 


Theorem Assume that there is a solution 


of the m equations: 


n T 
mo) - 3^ | Uy)gj ins). 0) du = 0. 
j=1 70 


such that 
lead k — mm pw 1,...,m [25b] 


is invertible. Then, for all e sufficiently small, eqn 
[21] bas a unique T-periodic solution which tends to 
x,,0 when e tends to 0. 


We show that under the hypothesis of the 
theorem, we can apply the results proved in the 
preceding section. Note that one can prove the 
theorem for eqn [24] because it reduces to [21] with 
the change of variables [23]. 


Note first that the m conditions [25a] imply that 
the m equations, 


de ": Dfxe(Xq0( 


dt t),t)-€+g( 


x,,0(t), t, 0) 
display a family of T-periodic solutions which 
depend on m parameters ^-—(,...,y4). From 
(13), one can write 


E(t) 


where £(t) is a particular T-periodic solution and 


the ó;(t) are independent T-periodic solutions 
of (22a). 


= yii (t) +++ oat) + E(t) [26] 


Lemma 1 A possible choice for the solutions ó;(t) 
Is Ox, (t)/Oo; PRA 


We have already assumed that these vectors are 
independent. They are obviously T-periodic solu- 
tions to (22a). 

In the following, we will assume that all other periodic 
solutions of (22a) are linear combinations of these. 

As a consequence of what was proved in the 
section on periodic orbits of linear systems, system 
[24] displays a periodic solution (for e small enough) 
if there exists a solution 


(^1, ~ 2) 


to equations 


»-Yf Us) s, 0) ds = 0, 


:= Li 
such that 


Qv, (y) 


Oyj h-5 R= ise 


is invertible. 
Lemma2 The quantities v,(y) depend linearly in ^. 


Proof Observe first that the quantities Fj(£, s, 0) 
depend quadratically of £: 


a2 ¢ 
FESO) = 53 gE (mol), 
kl 
Og; 
Da 0 (s), s, 0) 
十 06; (x,0(s),s, 0) [27] 
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Then, the solutions €(t) depend linearly on y. We thus 
obtain that a priori v)() are quadratic functions of 7: 


Vols - - « 9^ n) 
-iur f v E Am. Br, 
— Jib Ferða Oy, On 
T 
1 Of Oz, = xi 
£a] v E EA 
2 * ip xs (2 O^, 
Ogi Oz, 
LM d 28 
Oz, E | | 


where the dots represent quantities independent of y. 
We use then the expression 


d Oz; 

dt &03,00, 
sy 9 f Eu Rak Ozk 
i ki Oz,Oz] 227 O^, 和 Ozk Oyg00, 


This allows one to find the homogeneous quadratic 
part as 


ss 
> P OzpOz, Oy O^, 
=>, [ vr ina 8 EE 
- ds N0y,00, 
5) oi 9? Zk 
Xu 2 Oy Oy, " 


Integration by parts yields 

E l 
aieri 
5 0 P Oz, Oz Oy, O^, 


Enos e 


O^j4 0r, 


because U* is solution to [7a]. This shows that [28] 
is linear in y. Suffices to show that the determinant 
of this system does not vanish to have existence and 
uniqueness of the solution such that 


QA, 0, Vm 


0 
Fi e: i Tm 7 


Consider now the coefficient of the linear part: 


Pfi = , Ogj| Oe 
Y f v i See Oz; g + Se Oya 


290 Bi-Hamiltonian Methods in Soliton Theory 
and the coefficient 


-5 | v pdg, n 0da 


We can write 


dop _ ['(8Uj, Og; Oz, 
xd (Gat 8+ Uoga a) 
Note that 
d ð 
B= Y L6 + gel) a0) 


Integration by parts yields 


- [69 
aa? i 0 ds daq 


Se 


dag 


From the equation 
dU; | y fi 
dt n Oz; 


we deduce that 
d (QU jp Of, QU; ð? fk y Oz, 
dt (S )- p Oz; 0o t, 02;02, Uep Oa 


and thus this shows that 


_ M Pedroni, Università di Bergamo, 
- Dalmine (BG), Italy 


* © 2006 Elsevier Ltd. All rights reserved. 


Introduction 


At the end of the 1960s, the theory of integrable 
systems received a great boost by the discovery 
(made by Gardner, Green, Kruskal, and Miura) of 
the inverse-scattering method (see Integrable 
Systems: Overview). It allows one to reduce the 


dep 


- Of, . Og] Oz 
一 ht To |. d 
— A. | e pa scm id Oya 


This achieves the proof of the theorem. In the special 
case of Hamiltonian systems, in the case of the 
peturbations of an isochronous system, the method 
explained is equivalent to Moser's averaging theory. 

The reader is referred to other articles in this 
encyclopedia for a discussion of other aspects of 
synchronization, frequency locking, and phase locking. 


dag 


See also: Bifurcation Theory; Fractal Dimensions in 
Dynamics; Integrable Systems: Overview; Isochronous 
Systems; Leray-Schauder Theory and Mapping Degree; 
Ljusternik-Schnirelman Theory; Singularity and 
Bifurcation Theory; Symmetry and Symmetry Breaking in 
Dynamical Systems; Synchronization of Chaos; Weakly 
Coupled Oscillators. 
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solution of the (nonlinear) Korteweg-de Vries 
equation (henceforth simply the KdV equation) 


Ut = 4(Uxxx 一 6uuy) [1] 


to the solution of linear equations. After the KdV 
equation, a lot of other nonlinear partial differential 
equations, solvable by means of the inverse-scattering 
method, were found out. A common feature of such 
equations is the existence of soliton solutions, that 
is, solutions in the shape of a solitary wave (with 
additional interaction properties). For this reason 
they are called “soliton equations.” 


It was soon observed that the KdV equation can 
be seen as an infinite-dimensional Hamiltonian 
system with an infinite sequence of constants of 
motion in involution; the corresponding (commut- 
ing) vector fields are symmetries for the KdV 
equation, and form the so-called KdV hierarchy. In 
particular, Zakharov and Faddeev constructed 
action-angle variables for the KdV equation. These 
facts pointed out that the KdV equation is an 
infinite-dimensional analog of a classical integrable 
Hamiltonian system (Dubrovin et al. 2001), whose 
theory has been developed during the nineteenth 
century by Liouville, Jacobi, and many others. 
Moreover, the infinite-dimensional case suggested 
methods (such as the existence of a Lax pair) which 
were applied successfully also to finite-dimensional 
cases such as the Toda lattices and the Calogero 
systems. More recently, after the discovery by 
Witten and Kontsevich of remarkable relations 
between the KdV hierarchy and matrix models of 
two-dimensional (2D) quantum gravity, there has 
been a renewed interest in the study of soliton 
equations in the community of theoretical physicists. 
We also mention that the classical versions of the 
extended W,,-algebras of 2D conformal field theory 
are the (second) Poisson structures of the Gelfand- 
Dickey hierarchies. 

In this article we describe the so-called 
bi-Hamiltonian formulation of soliton equations. 
This approach to integrable systems springs from the 
observation, made by Magri at the end of the 1970s, that 
the KdV equation can be seen as a Hamiltonian system 
in two different ways. In the same circle of ideas, there 
were important works by Adler, Dorfman, Gelfand, 
Kupershmidt, Wilson, and many others. Thus, the 
concept of bi-Hamiltonian manifold, which constitutes 
the geometric setting for the study of bi-Hamiltonian 
systems, emerged. This notion and its applications to the 
theory of finite-dimensional integrable systems is 
discussed in Multi-Hamiltonian Systems. 

In the first section of this article, we discuss the 
Hamiltonian form of soliton equations and, more 
generally, we present an important class of infinite- 
dimensional Poisson (also called Hamiltonian) 
structures, namely those of hydrodynamic type. 
Then we show how to use the bi-Hamiltonian 
properties of the KdV equation in order to construct 
its conserved quantities. We also recall that the KdV 
~ equation can be seen as an Euler equation on the 
dual of the Virasoro algebra. In the third section, we 
deal with other examples of integrable evolution 
equations admitting a bi-Hamiltonian representa- 
tion, that is, the Boussinesq and the Camassa-Holm 
equations, and we consider the bi-Hamiltonian 
structures of hydrodynamic type. 
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Hamiltonian Methods in Soliton Theory 


The most famous example of soliton equation is 
the KdV equation [1], where z is usually a 
periodic or rapidly decreasing real function. The 
choice of the coefficients in the equation has no 
special meaning, since they can be changed 
arbitrarily by rescaling x, t, and u. Right after 
the discovery of the inverse-scattering method for 
solving the Cauchy problem for the KdV equation, 
it .was realized that this equation can be seen as an 
infinite-dimensional Hamiltonian system. Indeed, 
from a geometrical point of view, eqn [1] defines a 
vector field X(u)—(1/4)(uxx« — 6uux) on M, the 
infinite-dimensional vector space of C™ functions 
from the unit circle S! to R. (For the sake of 
simplicity, we consider only the periodic case; the 
integrals in this article are therefore understood to 
be taken on $!.) The vector field X associated with 
the KdV equation is Hamiltonian, that is, it can be 
factorized as 


X(u) = [-28,] [3(—uxx + 3)] 


where dH = (1/8)(—u,4 + 314) is the differential of 
the functional 


H(u) — iJ (r egi) dx 


that is, the variational derivative ó5/6u of the density 
b =(1/8)(u> + (1/2)u2), and P= —20, is a Poisson 
(or Hamiltonian) operator. This means that the 
corresponding composition law 


(FG) = [ arrac) ax = -2 | ar (dG), dx 2j 


between functionals of u has the usual properties 
of the Poisson bracket, that is, it is R-bilinear 
and skew-symmetric, and it fulfills the Leibniz 
rule and the Jacobi identity. In other words, 
(M,P) is an infinite-dimensional Poisson mani- 
fold. Using the Poisson bracket [2], eqn [1] can 
be written as 


u = {u, H} [3] 


corresponding to the usual Hamilton equation in 


R27" 
g - ig, H}, 


up to the replacement of z with u, and of the 
discrete index ; with the continuous index x. More 
precisely, in the expression u; = {u, H} the symbol u 
should be replaced by z* (in analogy with 2z'), the 
functional assigning to the generic function v € .M 
its value at a fixed point x, that is, 1 : v — v(x). In 


11.5.2 [4] 
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these notations, the Poisson bracket [2] takes the 
form 


(1,4) = —26'(x — y) 


where the ó-function is as usual defined as 


J f(y)6(x — y) dx = f(x) 


so that its derivatives are given by 


/ f(y)8") (x — y) dx = f® (x) 


Another important example is given by the 
Boussinesq equation 


Hg = T(— us + 4u? ip 4uux, ) [5] 


describing, like KdV, shallow water (soliton) waves 
in a nonlinear approximation. It can be obtained by 
the first-order (in time) system 


MEN AE eR 1 2d 2 1 2 
KW = Ux + Uy, — 3M ex u t = 2U; — Uy, [6] 


by taking the derivative of its second equation with 
respect to £, plugging the result in the first one, and 
setting u—u*. The system [6] is Hamiltonian, since it 
can be written as 


EN > [bb 
vrbe te 


with h= (u!)* + (1/9)G2? — utu? + (1/3)(u2)*, and 


3 : 
x 

is easily seen to be a Poisson operator. Thus, the 
Poisson manifold associated with the Boussinesq 
equation is the space of periodic C* functions with 
values in R?. More generally, one can consider the 
space M” of C* functions from the unit circle $! to 
R”. If P", for i, /—1,...,7, are the entries of a 
constant skew-symmetric matrix and 2^* assigns to 
the generic function v € M” the value of its ith 
components at a fixed point x, then 


(ab ab) = Pi6(x — y) 


defines a Poisson bracket on M”. One can also let 
the P” depend on the w^ in such a way that they 
form the components of a Poisson tensor on R”. If 
H= | bdx is a functional on M” with density 5, the 
associated Hamiltonian vector field gives rise to the 
following system of partial differential equations: 


eS ee 
dad f ETE 7 
J= 


In particular, if n=2N and 


rals 0 ) 


then we have the Hamiltonian formulation of the 
field equations, 


Another important example of Poisson bracket on 
M" is given by 


(167, uI} = glo (x — y) [8] 
where g/ are the entries of a constant symmetric 


matrix. In this case, the Hamiltonian vector field 
associated with H = [bdx is given by 


of a me LORY ， 
us = D0. (By) i=1,...,n [9] 
J= 


Notice that this vector field is zero if H= [ u* dx, 
with À — 1,...,7. This amounts to saying that such 
an H is a Casimir function of the Poisson bracket 
[8], that is, that (H, F] — 0 for all functionals F. A 
simple example of this class (with n= 2) is given by 
the Poisson structure of the Boussinesq equation, 
corresponding to the choice g!!—g?-—0 and 
g?^-—g?-1. Suppose now that the matrix with 
entries g” is invertible. Then they can be interpreted 
as the contravariant components of a flat pseudo- 
Riemannian metric in R". A change of coordinates 
(ul,...,u")9 (u!',...,4") in R” transforms the 
Poisson bracket [9] in 


(2^*, a^» = gl (u)é (x — y) + Ty (m)a6(x — y) [10] 


where g’/(z) are the components of the metric in the 
new coordinates and the l'7 are the contravariant 
Christoffel symbols related to the usual Christoffel 
symbols by 


[7 = ge, [11] 


Conversely, the expression [10] gives a Poisson 
bracket if the metric defined by g" is flat and its 
Christoffel symbols are related to the I7 by [11]. 
These are the Poisson structures of hydrodynamic 
type introduced by Dubrovin and Novikov. We will 
consider them again later. 


Bi-Hamiltonian Formulation 
of the KdV Equation 


The KdV equation [1] has a lot of remarkable 
properties, such as the Lax representation and the 
existence of a 7-function. In this section, we recall a 
geometrical feature of KdV, namely, the fact that it 


has a second Hamiltonian structure, and we show 
that the integrability of KdV can be seen as a natural 
consequence of its double Hamiltonian representa- 
tion. We have already seen that the KdV vector field 
X(u) = (1/4)(txxx — 6uux) can be written as 


X(u) = Po dH» 


where Po = —20, and 


1 3 1 2 
H;- gf (v ZE 


But X admits another Hamiltonian representation: 
X (u) = Pi dH, 


where P; = —(1/2)0xxx + 2u0, + ux and 


1 
Hi ==] | wae 


The important point is that P; is also a Poisson 
operator. Moreover, it is compatible with Po, that is, 
any linear combination of Po and P is still a Poisson 
operator. Thus, the KdV equation is a bi-Hamiltonian 
system, that is, it can be seen in two different (but 
compatible) ways as a Hamiltonian system. Next, we 
will show how this property can be used to construct 
an infinite sequence of conserved quantities for the 
KdV equation, which are in involution with respect to 
the Poisson brackets {- , -}) and {-,-}, associated with 
Po and P4. In particular, the phase space M of KdV 
is a bi-Hamiltonian manifold, that is, it has two 
different (but compatible) Poisson structures. Let us 
rename X;=X the KdV vector field. Since 
X = Po dH; =P; dH, one is naturally led to con- 
sider the vector fields 


Xo =PodH;, X5; = Py dH, 


Explicitly, Xo(u)=u, and X2(u)=(1/16)(txxxxx 一 
10uu y. — 20UyMyxx + 30u* uz). E. can check that 
these vector fields are also bi-Hamiltonian. Indeed, 


Xo(u) =P; dHo, with Ho = f u dx, and 
X5 = Po dH; with 
1 
H; = 一 "m (u, + Suu? PE 


The functional Ho is a Casimir of Po, that is, 
Po dHo = 0, so that the iteration ends on this side, 
but it can be continued indefinitely from the other 
side, as shown below. For the time being, let us take 
for granted that there exists an infinite sequence 
{Hy}p>9 of functionals such that Pı dH, = Po dH,,4; 
in other words, 


{ Hz}; = { Hijo [12] 
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Such relations are often called Lenard—Magri rela- 
tions. Then the functionals H; are in involution with 
respect to both Poisson brackets. Indeed, for k > j, 
one has 


UH Helo = [Hj Hg 4 = toa Hy ijo 
=- = {Hpg, Hj }o 


so that (H;, H;]g =0 for all j,k > 0, and therefore 
(H;, H} =0 for all j,k 2 0. Hence, these func- 
tionals are constants of motion (in involution) for 
the KdV equation. The Hamiltonian vector fields 
associated with them are symmetries for the KdV 
equation; the corresponding evolution equations are 
called higher-order KdV equations. The set of such 
equations is the well-known KdV hierarchy. We 
remark that the existence of a sequence of func- 
tionals (H;],.o, fulfilling the Lenard-Magri rela- 
tions [12] and starting from a Casimir of Po, is 
equivalent to the existence of a Casimir function 

N= soh At for the Poisson -pencil 
P,—P,; —APo, where À is a real parameter. A 
straightforward way (due essentially to Miura, 
Gardner, and Kruskal) to determine such a Casimir 
function is to consider the (generalized) Miura map 
h=u=hb, +h? — A. As shown by Kupershmidt 
and Wilson, it transforms the Poisson structure 
(1/2)0, (in the variable h) into the Poisson pencil 
Py = — (1/2) Oxxx + 2(u + A)O, + ux. Given u, the 
Riccati equation 


b, +h? =u+x [13] 


admits a unique solution with the asymptotic 
expansion P =z + 32,4 bez *, where z* = À. More- 
over, the coefficients þh, are differential polynomials 
in u (i.e., polynomials in u and its x-derivatives) that 
can be computed by recurrence. Thus, the general- 
ized Miura map can be seen as an invertible 
transformation. Since the functional h> f hdx isa 
Casimir of the Poisson structure (1/2)0,, it follows 
that if 5b(u) is the solution of the Riccati equation 
[13], then 4 — [ b(u) dx is a Casimir of the Poisson 
pencil P4. More precisely, one has to introduce the 
functional H(A) —z f b(u) dx, that turns out to be a 
Laurent series in A, because the even coefficients of 
b(u) are x-derivatives. This is the Casimir function 
we were looking for. Explicitly, one finds that the 
first terms of b(u) are 


bi = tu, 
h4 一 de (xxx = 4uu, ) 


hs = $ (xxxs 一 Cum — Su? + 21) 


ho = — jx, hs = CNN — u?) 


Obviously, 54 is the density of a Casimir function of 
Po, while h3 and 55 are (one-half of) the densities of the 
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two Hamiltonians Hı and H3 of the KdV equation. 
We conclude this section showing that, as observed 
by Khesin and Ovsienko (Arnol’d and Khesin 1998), 
the bi-Hamiltonian structures of KdV have a clear 
Lie-algebraic origin. Indeed, the second Hamiltonian 
structure is the Lie—Poisson structure on the dual of 
the Virasoro algebra, while the first one can be 
obtained by “freezing” the second one at a suitable 
point. Let A'(S!) be the Lie algebra of vector fields 


on $!. The Virasoro algebra is the vector space 
q—A(S!)R endowed with the  Lie-algebra 
structure 


= (Fg) - Gf G9) = 
f ros eo ax) m 


It is called a central extension of X'($!) since it is 
obtained by considering the usual commutator 
between vector fields (up to a sign) and by adding 
a copy of R, which turns out to be the center of 
the Virasoro algebra. Equation [14] gives rise 
indeed to a Lie-algebra structure because the 
expression ff'(x)g'(x)dx defines a 2-cocycle of 
X(S'). The dual space q* of q can be considered 
as the space of the pairs (u dx & dx,c), where 
u € C*(S!) and ce R. The pairing is obviously 
given by 


(u dxa d, di. CONCI det + 06 


The Lie—Poisson structure on the dual q* of a Lie 
algebra g is defined as 


{F, G}(X) = (X, [dF(X), dG(X)]) [15] 


where F, G € C™(q)* and their differentials at X € q* 
are seen as elements of q. When q is the Virasoro algebra 
and F(u,c)= J f(u, c) dx, Giu, c) = | g(u,c)dx are 
two functionals on q* whose densities f and g are 
differential polynomials in u, one has 


(ese sa (C) 
(E re 

-[4(9 (5)- 8) (9)« 
OOE s 


This is (up to rescaling) the second Poisson 
bracket of KdV. The KdV equation is therefore 
an Euler equation, that is, it can be obtained from 
the Euler equations for the rigid body by repla- 
cing the Lie algebra of the rotation group with 
the Virasoro algebra. To be more precise, the 


Hamiltonian vector field associated with 
Hi(u,c) = —(1/2)( f 1? dx +c) is 
U, + 3uu, + Cxxx = 0, c0; —0 


If c #0, this is (up to rescaling) the KdV equation 
[1]. For c— 0, we have the Burgers equation (also 
called dispersionless KdV equation), to be discussed 
again later on. The first Poisson bracket for the KdV 
hierarchy can. be obtained by “freezing” the Lie- 
Poisson bracket at the point ((1/2)dx & dx, 0) of the 
dual of the Virasoro algebra. This means that 
instead of [16] one has to consider 


{ G}o(u, c) 


-(G ;dx @ dx, 0), (GP) ej 
-E S E) e) 
JOO- OH) m 


The corresponding Hamiltonian is H5 — (1/2) 
f(u? + cud) dx. From this (Lie algebraic) point of 
view, the compatibility between the two Poisson 
brackets follows from the fact that the pencil {- , -}, = 
{-,-}—Af-,-}ọ is obtained from the Lie—Poisson 
bracket {- , -} by applying the translation 


(u dx & dx, e) ( (n dx dr,e) 


Other Examples 


In the previous section, we have presented the bi- 
Hamiltonian structure of the KdV equation and 
some of its properties. Now we give two more 
examples of equations — the Boussinesq equation 
and the Camassa-Holm equation — admitting a 
bi-Hamiltonian formulation. We have seen in an 
earlier section that the system [6] associated with 
the Boussinesq equation [5] is Hamiltonian with 
respect to the Poisson structure [7] and the 
Hamiltonian 


A more complicated Poisson structure for this 
system is 


A —304 + 3u> + 9u'd, + 3ul 
P= [18] 
B —603 + 6u Oy + 3u2 


with 


A = 20? — 40? — 64202 + (2(u*)” + 6u! — 6u2,,) 0, 
+ (3ul, — 2u? e + 2u^u2) 


XXX 


and 


B = 30% — 3^8? + (9u! — 642)8, + (6u! — 3u2,,) 

It can be obtained by means of the Drinfeld- 
Sokolov reduction (or also by means of a 
bi-Hamiltonian reduction) from the Lie—Poisson 
structure (modified with the cocycle 0,) on the 
space of C* maps from S! to the Lie algebra of 
3 x 3 traceless matrices. This is the reason why it is 
a Poisson structure, compatible with [7]. The system 
[6] can be written as 


i, CALO 

u (6b5/6u?) 

where 5; = (1/3)u, is the density of a Casimir of the 
Poisson structure [7]. Thus, the Boussinesq equation 
is a bi-Hamiltonian system and can be shown to 
possess, like KdV, an infinite sequence of conserved 
quantities and symmetries, forming the Boussinesq 
hierarchy. The KdV and the Boussinesq hierarchy are 
indeed particular examples of Gelfand-Dickey hier- 
archies (Dickey 2003). They are hierarchies of 
systems of n equations with n unknown functions 
and they are related, via the Drinfeld-Sokolov 
approach, to the Lie algebra SI(z + 1). As shown by 
Adler, Dickey, and Gelfand, these hierarchies have a 
bi-Hamiltonian formulation. Also the generalized 
KdV equations, associated by Drinfeld and Sokolov 
with an arbitrary affine Kac- Moody Lie algebra, are 
bi-Hamiltonian (or are obtained as suitable reduc- 


tions of bi-Hamiltonian systems). Let us consider 
now the (dispersionless) Camassa-Holm equation 


"M23 


he = Wu. = = + 2Óstie.s + uu [19] 


which also describes shallow water waves, and 
possesses remarkable solutions called peakons, since 
they represent traveling waves with discontinuous 
first derivative. In order to supply this equation with a 
(bi-)Hamiltonian structure, one has to perform the 
change of variable m =u —uxx, whose inverse, in the 
space of period-1 functions, turns out to be given by 
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u(x) = f “mij sinh(y — 2) ry 


1 À 1 
TA sae J. m(y) cosh (y —x-— z) dy 


The Camassa-Holm equation is then bi-Hamiltonian 
with respect to the Poisson pair 


Py = Gee — Ge P, = 2m0, + m 


Indeed, it can be written as 7: — P, dH; = P4 dH5, 
where 


H; = -3 | mas 
Hy => | 08 + mud) de 


Notice that the Poisson pair of the Camassa—Holm 
equation can be obtained from that of KdV by 
moving the cocycle xxx from the second Poisson 
structure to the first one. Indeed, 


Pts. bie) = Osx 十 bo. T c(2 mà, T m») 
a,b,ceR [20] 


is a family of pairwise compatible Poisson operators. 
Moreover, we mention that Misiolek has shown that 
also the Camassa-Holm equation is an Euler equation 
on the dual of the Virasoro algebra. We conclude this 
article with a brief discussion concerning the so-called 
bi-Hamiltonian structures of hydrodynamic type. They 
play a relevant role in the theory of Frobenius 
manifolds, that, in turn, have deep relations with 
many important topics in contemporary mathematics 
and physics, such as Gromov- Witten invariants and 
isomonodromic deformations. As we have seen in the 
earlier section, a Poisson structure of hydrodynamic 
type is given, on the space of C^ maps from S! to (an 
open set of) R”, by 


(u^, uh) = g’ (u)8 (x — y) + Ty (w)usó(x —y) [21] 


where g/(u) are the contravariant components of 
a (pseudo-)Riemannian flat metric and the I7 are 
the (contravariant) Christoffel symbols of the 
metric. If two Poisson structures of hydrodynamic 
type are given, it can be shown that they are 
compatible if and only if the two corresponding 
metrics form a flat pencil. This means that their 
linear combinations (with constant coefficients) 
are still flat (pseudo-)Riemannian metrics, and 
that the contravariant Christoffel symbols of the 
linear combinations are the linear combinations 
of the contravariant Christoffel symbols of the 
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two metrics. The simplest example is given by the 
bi-Hamiltonian formulation of the Burgers (or 
dispersionless KdV) equation, 


Ur + 3uuy —0 


that we have already encountered. We know that 
this equation is Hamiltonian with respect to the 
(Lie-)Poisson operator 240, + tx, with Hamiltonian 
function Hı = —(1/2) f u? dx, and with respect to 
the Poisson operator ôy, with Hamiltonian function 
H= —(1/2) f 1? dx. This also means that the bi- 
Hamiltonian structure of the Burgers equation 
comes from the family [20]. The first Hamiltonian 
structure corresponds to the standard metric on R, 
that is, du & du, whereas the second one is given by 
the metric (24) ! du @ du. 
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Billiard Flow and Billiard Ball Map 


The billiard system describes the motion of a free 
particle inside a domain with elastic reflection off the 
boundary. More precisely, a billiard table is a 
Riemannian manifold M with a piecewise smooth 
boundary, for example, a domain in the plane. The 
point moves along a geodesic line with a constant speed 
until it hits the boundary. At a smooth boundary point, 
the billiard ball reflects so that the tangential compo- 
nent of its velocity remains the same, while the normal 
component changes its sign. This means that both 
energy and momentum are conserved. In dimension 2, 
this collision is described by a well-known law of 
geometrical optics: the angle of incidence equals the 
angle of reflection. Thus, the theory of billiards has 
much in common with geometrical optics. If the billiard 
ball hits a corner, its further motion is not defined. 
The billiard reflection law satisfies a variational 
principle. Let A and B be fixed points in the billiard 
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table and let AXB be a billiard trajectory from A to 
B with reflection at a boundary point X. Then, the 
position of a variable point X extremizes the length 
AXB. This is the Fermat principle of geometrical 
Optics. 

In this article, we discuss billiards in bounded 
convex domains with smooth boundary, also called 
Birkhoff billiards. A related article treats billiards in 
polygons (see Polygonal Billiards). 

The billiard flow is defined as a continuous-time 
dynamical system. The time-£ billiard transformation 
acts on unit tangent vectors to M which constitute the 
phase space of the billiard flow, and the manifold M is 
its configuration space. Thus, the billiard flow is the 
geodesic flow on a manifold with boundary. 

It is useful to reduce the dimensions by one and to 
replace continuous time by discrete one, that is, to 
replace the billiard flow by a mapping, called the 
billiard ball map and denoted by T. The phase space 
of the billiard ball map consists of unit tangent 
vectors (x,v) with the foot point x on the boundary 
of M and the inward direction v. A vector (x,v) 
moves along the geodesic through x in the direction 
of v to the next point of its intersection x; with the 
boundary M, and then v reflects in OM to the new 


C/A/ 


LV 


Figure 1 Billiard ball map. 


inward vector vı. Then, one has: T(x,v) =(x1,v1). 
For a convex M, the map T is continuous. If M is 
n-dimensional, then the dimension of the phase 
space of the billiard ball map is 2m — 2. 

Equivalently, and more in the spirit of geometrical 
optics, one considers £, the space of oriented 
geodesics (rays of light) that intersect the billiard 
table. This space of lines is in one-to-one correspon- 
dence with the phase space of the billiard ball map: 
to an inward unit vector (x,v) there corresponds the 
oriented line through x in the direction v (Figure 1). 

The space of rays £ carries a canonical symplec- 
tic structure, that is, a closed nondegenerate 
differential 2-form. In the Euclidean case, this 
symplectic structure w is defined as follows. Given 
an oriented line Z in R”, let q be the unit vector 
along / and p be the vector obtained by dropping 
the perpendicular from the origin to /. Then, 
w=dp ^ dq — >», dp; ^ dq;. This construction identi- 
fies £ with the cotangent bundle of the unit sphere: 
q is a unit vector and fp is a (co)tangent vector at q, 
and w identifies with the canonical symplectic 
structure of T*S"^!, In the general case of a 
Riemannian manifold M, the symplectic structure 
on the space of oriented geodesics is obtained from 
that on T*M by symplectic reduction. 

One has an important result: the billiard ball map 
preserves the symplectic structure T*(w)=w. As a 
consequence, T is also measure preserving. In the 
planar case, one has the following explicit formula 
for this measure. Let ? be an arc length parameter 
along the boundary of the billiard table and let 
a € [0,7] be the angle made by the unit vector with 
this boundary. Then, (a,t) are coordinates in the 
phase space, identified with the cylinder, and the 
invariant measure is sin a do dt. 

As a consequence, the total area of the phase 
space equals 2L where L is the perimeter length of 
the boundary of the billiard table, and the mean free 
path equals 7A/L, where A is the area of the billiard 
table. In the general n-dimensional case, the mean 
free path equals 


vol(S"-!) vol(M) 


vol(B"-!) vol(@M) 
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where $"-! and B”! are the unit sphere and the unit 
disk in Euclidean spaces. 


Existence and Nonexistence of Caustics 


Given a plane billiard table, a caustic is a curve 
inside the table such that if a segment of a billiard 
trajectory is tangent to this curve then so is each 
reflected segment. Caustics correspond to invariant 
circles of the billiard ball map (i.e., invariant curves 
that go around the phase cylinder): such an invariant 
circle is a one-parameter family of oriented lines, 
and the respective caustic is their envelop. An 
envelop may have cusp-like singularities but if the 
boundary of the billiard table is a smooth curve with 
positive curvature then a caustic, sufficiently close to 
the boundary, is smooth and convex. 

One can recover the table from a caustic by the 
following string construction. Let y be a caustic. 
Wrap a closed nonstretchable string around y, pull it 
tight at a point and move this point around ^ to 
obtain a new curve I. Then, y is a caustic for the 
billiard inside r. Note that this construction has one 
parameter, the length of the string. 

The following useful “mirror equation" relates 
various quantities depicted in Figure 2: 


i d 2k 


a b sina 


where k is the curvature of the boundary at the 
impact point. 

Do caustics exist for every convex billiard table? 
This is important to know, in particular, because the 
existence of a caustic implies that the billiard ball 
map is not ergodic. The answer is given by a 
theorem of Lazutkin: if tbe boundary of the billiard 
table is sufficiently smootb and its curvature never 
vanishes, then there exists a collection of smooth 
caustics in the vicinity of the billiard curve whose 
union has a positive area. Originally this theorem 
asked for 553 continuous derivatives; later this was 
reduced to six. This result uses the techniques of the 
KAM (Kolmogorov-Arnol'd-Moser) theory. The 


Figure 2 String construction and mirror equation. 
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crucial fact is that, in appropriate coordinates, the 
billiard ball map is approximated, near the bound- 
ary of the phase cylinder, by the integrable map 
(x, y) = (x + y, y). 

On the other hand, by a theorem of Mather, if the 
curvature of a convex smooth billiard curve vanishes 
at some point, then this billiard ball map has no 
invariant circles. This result belongs to the well- 
developed theory of area-preserving twist maps of 
the cylinder, of which the billiard ball map is an 
example. 


Integrable Billiards 


Let a plane billiard table be an ellipse with foci Fi 
and Fz. It is known since antiquity that a billiard 
ball shot from Fi reflects to F2. A generalization of 
this optical property of the ellipse is the following 
theorem: a billiard trajectory inside an ellipse 
forever remains tangent to a fixed confocal conic. 
More precisely, if a segment of a billiard trajectory 
does not intersect the segment FiF;, then all the 
segments of this trajectory do not intersect F;F; and 
are all tangent to the same ellipse with foci F; and F3; 
and if a segment of a trajectory intersects FF, 
then all the segments of this trajectory intersect Fj F2 
and are all tangent to the same hyperbola with foci 
Fi and F. 

It follows that confocal ellipses are the caustics of 
the billiard inside an ellipse. In particular, a 
neighborhood of the boundary of such a billiard 
table is foliated by caustics. A long-standing 
conjecture, attributed to Birkhoff, asserts that if a 
neighborhood of a strictly convex smooth boundary 
of a billiard table is foliated by caustics, then this 
table is an ellipse. This conjecture remains open. The 
best result in this direction is a theorem of Bialy: if 
almost every phase point of the billiard ball map in a 
strictly convex billiard table belongs to an invariant 
circle, then the billiard table is a disk. 

The multidimensional analogs of the optical 
properties of an ellipse are as follows. Consider an 
ellipsoid M in R” given by the equation 
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x x x 
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and define the confocal family of quadrics M, by the 
equation 
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where A is a real parameter. The topological type of 
M, changes as A passes the values —a? 


g > 


One has the following theorem: a billiard 
trajectory inside M remains tangent to fixed 
(n— 1) confocal quadrics. A similar and closely 
related result holds for the geodesic curves on M: 
the tangent lines to a fixed geodesic on M are 
tangent to (n — 2) other fixed quadrics, confocal 
with M. For a triaxial ellipsoid, this theorem goes - 
back to Jacobi. 

Explicit formulas for the integrals of the billiard 
in an n-dimensional ellipsoid [1] are as follows. Let 
(x,v) be a phase point, a unit inward tangent vector 
whose foot point x lies on the boundary. The 
following functions are invariant under the billiard 
ball map: 
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these functions are not independent: Fl + ---+ F,— 1. 

In fact, the integrals F; Poisson-commute (with 
respect to the Poisson bracket associated with the 
symplectic structure in the phase space of the 
billiard ball map that was described above). Accord- 
ing to the Arnol'd-Liouville theorem, this complete 
integrability of the billiard inside an ellipsoid implies 
that the phase space is foliated by invariant tori and, 
in appropriate coordinates, the map on each torus is 
a parallel translation. 

Similar results on complete integrability hold 
for billiards inside quadrics in spaces of constant 
positive or negative curvature. The former is 
the intersection of a quadratic cone with the 
unit sphere, and the latter with the unit 
pseudosphere. 


Periodic Orbits 


Periodic billiard trajectories inside a planar billiard 
table correspond to inscribed polygons of extremal 
perimeter length. When counting periodic trajec- 
tories, one does not distinguish between polygons 
obtained from each other by cyclic permutation or 
reversing the order of the vertices. In other words, 
one counts the orbits of the dihedral group D,, 
acting on m-periodic billiard polygons. 

An additional topological characteristic of a 
periodic billiard trajectory is the rotation number 
defined as follows. Assume that the boundary y of a 
billiard table is parametrized by the unit circle and 
consider a polygon (x4,x5,...,x,) inscribed in y. 
For all i, one has x;41 = x; + t; with t; € (0, 1). Since 
the polygon is closed, t; +--+- + ta € Z. This integer, 
that takes values from 1 to z— 1, is called the 
rotation number of the polygon and denoted by p. 
Changing the orientation of a polygon replaces the 
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Figure 3 Rotation numbers of periodic trajectories. 


rotation number p by n — p. The leftmost 5-periodic 
trajectory in Figure 3 has p= 1 and the other three 
pu 

The following theorem is due to Birkhoff: for 
every n> 2 and p € |(n —1)/2], coprime with n, 
there exist two geometrically distinct n-periodic 
billiard trajectories witb tbe rotation number p. For 
example, there are at least two 2-periodic billiard 
trajectories inside every smooth oval: one is the 
diameter, the longest chord, and another one is of 
minimax type, similar to the minor axis of an 
ellipse. 

In higher dimensions, lower bounds on the 
number of periodic billiard trajectories inside strictly 
convex domains with smooth boundaries were 
obtained only recently by Farber and the present 
author. Here is one of the results: for a generic 
billiard table in R”, the number of m-periodic 
trajectories is not less than (n — 1)(m — 1). The 
proof consists in using the Morse theory to estimate 
below the number of critical points of the perimeter 
length function on the space of inscribed z-gons and 
its quotient space by the dihedral group D,,, and the 
main difficulty is in describing the topology of these 
spaces. 

Returning to convex smooth planar billiards, the 
following conjecture remains open for a long time: 
the set of z-periodic points of the billiard ball map 
has zero measure. This is easy for 1 —2; for n —3 
this is a theorem by M Rychlik. The motivation for 
this question comes from spectral geometry. In 
particular, according to a theorem of Ivrii, the 
above conjecture implies the Weyl conjecture on 
the second term for the spectral asymptotics of the 
Laplacian in a bounded domain with the Dirichlet 
or Neumann boundary conditions. 


Length Spectrum 


The set of lengths of the closed trajectories in a 
convex billiard M is called the length spectrum of M. 
There is a remarkable relation between the length 
spectrum and the spectrum of the Laplace operator 
in M with the Dirichlet boundary condition: 
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Af =f, flay — 0. From the physical point of view, 
the eigenvalues A are the eigenfrequencies of the 
membrane M with a fixed boundary. Roughly 
speaking, one can recover the length spectrum from 


that of the Laplacian. More precisely, the following 
theorem of K Anderson and R Melrose holds: 


M» COS ( vx) 


A;€spec A 


is a well-defined generalized function (distribution) 
of t, smooth away from the length spectrum. That is, 
if 1» 0 belongs to the singular support of this 
distribution, then there exists either a closed billiard 
trajectory of length /, or a closed geodesic of length / 
in the boundary of the billiard table. 

This relation between the Laplacian and the 
length spectrum is due to the fact that geometric 
Optics is not a very accurate description of light. In 
wave optics, light is considered as electromagnetic 
waves, and geometric optics gives a realistic approx- 
imation only when the wave length is small. This 
small-wave approximation is based on the assump- 
tion that the waves are locally almost harmonic, 
while their amplitudes change slowly from point to 
point. The substitution of such a function into the 
corresponding PDEs gives, in the first approxima- 
tion, the equations of wave fronts, that is, of 
geometric optics. 

Here is another spectral result concerning a 
smooth strictly convex plane domain, due to 
S Marvizi and R Melrose. Let L, be the supremum 
and J, the infimum of the perimeters of simple 
billiard n-gons. Then, 


lim n*(L, —1,) =0 


noo 


for any positive k. Furthermore, L, has an asymp- 
totic expansion, as n 一 oo, 


Ls 51 R2 
i 


where / is the length of the boundary of billiard table 
and c; are constants, depending on the curvature of 
the boundary. 
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Introduction 


Over the last 30 years, black holes have been 
shown to have a number of surprising properties. 
These discoveries have revealed unforeseen relations 
between the otherwise distinct areas of general 
relativity, quantum physics, and statistical 
mechanics. This interplay, in turn, led to a number 
of deep puzzles at the very foundations of physics. 
Some have been resolved while others continue to 
baffle physicists. The starting point of these 
fascinating developments was the discovery of 
laws of black hole mechanics by Bardeen, 
Bekenstein, Carter, and Hawking. They dictate the 
behavior of black holes in equilibrium, under small 
perturbations away from equilibrium, and in fully 
dynamical situations. While they are consequences 
of classical general relativity alone, they have a 
close similarity with the laws of thermodynamics. 
The origin of this seemingly strange coincidence lies 
in quantum physics. For further discussion, 
see Asymptotic Structure and Conformal Infinity; 
Loop Quantum Gravity; Quantum Geometry and 
Its Applications; Quantum Field Theory in Curved 
Spacetime; Stationary Black Holes. 

The focus of this article is just on black hole 
mechanics. The discussion is divided into three parts. 
In the first, we will introduce the notions of event 
horizons and black hole regions and discuss properties 
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of globally stationary black holes. In the second, we will 
consider black holes which are themselves in equili- 
brium but in surroundings which may be time 
dependent. Finally, in the third part, we summarize 
what is known in the fully dynamical situations. For 
simplicity, all manifolds and fields are assumed to be 
smooth and, unless otherwise stated, spacetime is 
assumed to be four dimensional, with a metric of 
signature —, 十 , 十 , 十 , and the cosmological constant 
is assumed to be zero. An arrow under a spacetime 
index denotes the pullback of that index to the horizon. 


Global Equilibrium 


To capture the intuitive notion that black hole is a 
region from which signals cannot escape to the 
asymptotic part of spacetime, one needs a precise 
definition of future infinity. The standard strategy is to 
use Penrose's conformal boundary Jt. A black hole 
region B of a spacetime (M, g4) is defined as B= MY 
I (3*), where I~ denotes “chronological past." The 
boundary 08 of the black hole region is called the 
“event horizon” and denoted by E. Thus, € is the 
boundary of the past of 7*. It therefore follows that £ is 
a null 3-surface, ruled by future inextendible null 
geodesics without caustics. If the spacetime is globally 
hyperbolic, an “instant of time" is represented by a 
Cauchy surface M. The intersection of B with M may 
have several disjoint components, each representing a 
black hole at that instant of time. If M' is a Cauchy 
surface to the future of M, the number of disjoint 
components of M' U 4 in the causal future of MU B 
must be less than or equal to those of MUS 


(see Hawking and Ellis (1973)). Thus, black holes can 
merge but can not bifurcate. (By a time reversal, i.e., by 
replacing 7^ with 7. and I^ with J*, one can define a 
white hole region W. However, here we will focus only 
on black holes.) 

A spacetime (M, gp) is said to be stationary (i.e., time 
independent) if gi, admits a Killing field 7^ that 
represents an asymptotic time translation. By conven- 
tion, £^ is assumed to be unit at infinity. (M, gj) is said 
to be axisymmetric if g,, admits a Killing field à 
generating an SO(2) isometry. By convention ó^ is 
normalized such that the affine length of its integral 
curves is 27. Stationary spacetimes with nontrivial M 
I~(g*) represent black holes which are in global 
equilibrium. In the Einstein-Maxwell theory in four 
dimensions, there exists a unique three-parameter 
family of stationary black hole solutions, generally 
parametrized by mass m, angular momentum J, and 
electric charge O. This is the celebrated Kerr-Newman 
family. Therefore, in general relativity a great deal of 
work on black holes has focused on these solutions and 
perturbations thereof. The Kerr-Newman family is 
axisymmetric and furthermore, its metric has the 
property that the 2-flats spanned by the Killing fields 
t^ and de are orthogonal to a family of 2-surfaces. This 
property is called “t—@ orthogonality.” These features of 
Kerr-Newman space-times are widely used in black 
hole physics. Note however that uniqueness fails in 
higher dimensions, and also in the presence of 
nonabelian gauge fields or rings of perfect fluids around 
black holes in four dimensions. In mathematical 
physics, there is significant literature on the new 
stationary black hole solutions in Einstein-Yang- 
Mills-Higgs theories. These are called *hairy black 
holes." Research on stationary black hole solutions with 
rings received a boost by a recent discovery that these 
black holes can violate the Kerr inequality J < Gm? 
between angular momentum J and mass m. 

A null 3-manifold K in M is said to be a “Killing 
horizon" if g,, admits a Killing field K^ which is 
everywhere normal to K. On a-Killing horizon, one 
can show that the acceleration of K? is proportional 
to K* itself: 


K^V,K* = KK? [1] 


The proportionality function A is called “surface 
gravity." We will show in the next section that if a 
mild energy condition holds on K, then « must be 
constant. Note that if we rescale K^ via K* — cK*?, 
where c is a constant, surface gravity also rescales as 
K — CK. 

In the Kerr-Newman family, the event horizon is 
a Killing horizon. More generally, if an axisym- 
metric, stationary black hole spacetime (M, g,;) 
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satisfies the t-@ orthogonality property, its event 
horizon € is a Killing horizon. (Although one can 
envisage stationary black holes in which these 
additional symmetry conditions are not met, this 
possibility has been ignored in black hole mechanics 
on stationary spacetimes. Quasilocal horizons, dis- 
cussed below, do not require any spacetime symme- 
tries.) In these cases, the normalization freedom in 
K’ is fixed by requiring that K^ have the form 


K* = f? 4 Qd? i2] 


on the horizon, where 2 is a constant, called the 
“angular velocity of the horizon." The resulting A is 
called the surface gravity of the black hole. It is 
remarkable that « is constant for all such black 
holes, even when their horizon is highly distorted 
(i.e., far from being spherically symmetric) either 
due to rotation or due to external matter fields. This 
is analogous to the fact that the temperature of a 
thermodynamical system in equilibrium is constant, 
independently of the details of the system. In 
analogy with thermodynamics, constancy of & is 
referred to as the “zeroth law of black hole 
mechanics.” 

Next, let us consider an infinitesimal perturbation 
6 within the three-parameter Kerr-Newman family. 
A simple calculation shows that the changes in the 
Arnowitt-Deser-Misner (ADM) mass m, angular 
momentum J, and the total charge O of the 
spacetime and in the area a of the horizon are 
constrained via 

bm =; 5 6a +06] +Q [3] 
where the coefficients «,9,® are black hole para- 
meters, ® = A,K* being the electrostatic potential at 
the horizon. The last two terms, 24] and ®6Q, have 
the interpretation of “work” required to spin the 
black hole up by an amount ó/ or to increase its 
charge by 6Q. Therefore, [3] has a striking resem- 
blance to the first law, 6E — TóS + 6W, of thermo- 
dynamics if (as the zeroth law suggests) & is made 
proportional to the temperature T, and the horizon 
area a to the entropy S. Therefore, [3] and its 
generalizations discussed below are referred to as 
the “first law of black hole mechanics." 

In Kerr-Newman spacetimes, the only contribu- 
tion to the stress-energy tensor comes from the 
Maxwell field. Bardeen et al. (1973) consider 
stationary black holes with matter such as perfect 
fluids in the exterior region and stationary perturba- 
tions ó thereof. Using Einstein's equations, they 
show that the form [3] of the first law does not 
change; the only modification is addition of certain 
matter terms on the right-hand side which can be 
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interpreted as the work óW done on the total 
system. A generalization in another direction was 
made by Iyer and Wald (1994) using Noether 
currents. They allow nonstationary perturbations 
and, more importantly, drop the restriction to 
general relativity. Instead, they consider a wide 
class of  diffeomorphism-invariant Lagrangian 
densities L(gap; Rabeds Val podes «++ 5 9, Va a.) 
which depend on the metric gp, matter fields -, 
and a finite number of derivatives of the Riemann 
tensor and matter fields. Finally, they restrict 
themselves to « #0. In this case, on the maximal 
analytic extension of the spacetime, the Killing field 
K* vanishes on a 2-sphere $, called the bifurcate 
horizon. Then, [3] is generalized to 

6m = 5-68, + 6W [4] 

2T 

Here 6W again represents “work terms" and Spor is 
given by 


óL 
Sor = -27 d — —— Nabe 5 
i 4 PW 5 


where 7, is the binormal to S, (with rpn® = — 2), 
and the functional derivative inside the integral is 
evaluated by formally viewing the Riemann tensor 
as a field independent of the metric. For the 
Einstein-Hilbert action, this yields Shor =4/4G and 
one recovers [3]. 

These results are striking. However, the under- 
lying assumptions have certain unsatisfactory 
aspects. First, although the laws are meant to refer 
just to black holes, one assumes that the entire 
spacetime is stationary. In thermodynamics, by 
contrast, one only assumes that the system under 
consideration is in equilibrium, not the whole 
universe. Second, in the first law, quantities a, Q, ® 
are evaluated at the horizon while M, J are 
evaluated at infinity and include contributions from 
possible matter fields outside the black hole. A more 
satisfactory law of black hole mechanics would 
involve attributes of the black hole alone. Finally, 
the notion of the event horizon is extremely global 
and teleological since it explicitly refers to 7*. An 
event horizon may well be developing in the very 
room you are sitting today in anticipation of a 
gravitational collapse in the center of our galaxy 
which may occur a billion years hence. This feature 
makes it impossible to generalize the first law to 
fully dynamical situations and relate the change in 
the event horizon area to the flux of energy and 
angular momentum falling across it. Indeed, one can 
construct explicit examples of dynamical black holes 
in which an event horizon £ forms and grows in the 
flat part of a spacetime where nothing happens 


physically. These considerations call for a replace- 
ment of € by a quasilocal horizon which leads to a 
first law involving only horizon attributes, and 
which can grow only in response to the influx of 
energy. Such horizons are discussed in the next two 
sections. 


Local Equilibrium 


The key idea here is drop the requirement that 
spacetime should admit a stationary Killing field and 
ask only that the intrinsic horizon geometry be time 
independent. Consider a null 3-surface A in a 
spacetime (M,g,,) with a future-pointing normal 
field /^. The pullback qi, := ga, of the spacetime 
metric to A is the intrinsic, degenerate “metric” of A 
with signature 0, +, +. The first condition is that it 
be “time independent,” that is, Lrqap =0 on A. 
Then by restriction, the spacetime derivative opera- 
tor V induces a natural derivative operator D on A. 
While D is compatible with qap, that is, Dađbe =Q, it 
is not uniquely determined by this property because 
qab is degenerate. Thus, D has extra information, 
not contained in qab. The pair (qab, D) is said to 
determine the intrinsic geometry of the null surface 
A. This notion leads to a natural definition of a 
horizon in local equilibrium. Let A be a null, three- 
dimensional submanifold of (M,g,,) with topology 
S x R, where 8 is compact and without boundary. 


Definition 1 ^ is said to be “isolated horizon” if it 
admits a null normal f° such that: 


(i) Le Gap — 0 and [L,,D]=0 on A and 
(ii) —T*,é° is a future pointing causal vector on A. 


On can show that, generically, this null normal field 
(^ is unique up to rescalings by positive constants. 


Both conditions are local to A. In particular, (M, g,,) 
is not required to be asymptotically flat and there is no 
longer any teleological feature. Since A is null and 
Leqab — 0, the area of any of its cross sections is the 
same, denoted by a4. As one would expect, one can 
show that there is no flux of gravitational radiation or 
matter across A. This captures the idea that the black 
hole itself is in equilibrium. Condition (ii) is a rather 
weak “energy condition" which is satisfied by all 
matter fields normally considered in classical general 
relativity. The nontrivial condition is (i). It extracts 
from the notion of a Killing horizon just a “tiny part” 
that refers only to the intrinsic geometry of A. As a 
result, every Killing horizon Ķ is, in particular, an 
isolated horizon. However, a spacetime with an 
isolated horizon A can admit gravitational radiation 
and dynamical matter fields away from A. In fact, asa 
family of Robinson-Trautman spacetimes illustrates, 


gravitational radiation could even be present arbitra- 
rily close to A. Because of these possibilities, there are 
many nontrivial examples and the transition from 
event horizons of stationary spacetimes to isolated 
horizons represents a significant generalization. of 
black hole mechanics. (In fact, the derivation of the 
zeroth and the first law requires slightly weaker 
assumptions, encoded in the notion of a “weakly 
isolated horizon" (Ashtekar et al. 2000, 2001 ).) 

An immediate consequence of the requirement 
Liqab =Q is that there exists a 1-form wa on A such 
that D, =w, 0. Following the definition of « on a 
Killing horizon, the surface gravity k of (A,/) is 
defined as Kir) = wal®. Again, under /^ — c/^, we have 
Kic) =cke. Together with Einstein’s equations, the 
two conditions of Definition 1 imply Lyw,=0 and 
("^Di;up, —0. The Cartan identity relating the Lie 
and exterior derivative now yields 


Da (wpe?) = Dak = 0 [6] 


Thus, surface gravity is constant on every isolated 
horizon. This is the zeroth law, extended to horizons 
representing local equilibrium. In the presence of an 
electromagnetic field, Definition 1 and the field 
equations imply L; F,, — 0 and # F, = 0. The first of 
these equations implies that one can always choose a 
gauge in which LA, — 0. By Cartan identity it then 
follows that the electrostatic potential $,5:— A, is 
constant on the horizon. This is the Maxwell analog 
of the zeroth law. 

In this setting, the first law is derived using a 
Hamiltonian framework (Ashtekar et al. 2000, 
2001). For concreteness, let us assume that we are 
in the asymptotically flat situation and the only 
gauge field present is electromagnetic. One begins by 
restricting oneself to horizon geometries such that A 
admits a rotational vector field 4^ satisfying 
Lodab =9. (In fact for black hole mechanics, it 
suffices to assume only that £,c,; — 0, where c,; is 
the intrinsic area 2-form on A. The same is true on 
dynamical horizons discussed in the next section.) 
One then constructs a phase space T of gravitational 
and matter fields such that (1) M admits an internal 
boundary A which is an isolated horizon; and (2) all 
fields satisfy asymptotically flat boundary conditions 
at infinity. Note that the horizon geometry is 
allowed to vary from one phase-space point to 
another; the pair (q,,, D) induced on A by the 
spacetime metric only has to satisfy Definition 1 and 
the condition £,4,; =Q. 

Let us begin with angular momentum. Fix a 
vector field ó^ on M which coincides with the fixed 
2 on A and is an asymptotic rotational symmetry 
at infinity. (Note that ó^ is not restricted in any way 
in the bulk.) Lie derivatives of gravitational and 
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matter fields along $^ define a vector field X(ó) on 
I. One shows that it is an infinitesimal canonical 
transformation, that is, satisfies Zxi;Q) — 0, where Q 
is the symplectic structure on T. The Hamiltonian 
H(@) generating this canonical transformation is 
given by 


H(¢) = JÉ — JO 


1 1 
(ġ) a et ai a\~* 
P = -rg f tute- gy PAF 


where J® is the ADM angular momentum at 
infinity, S is any cross section of A, and e the area 
element thereon. The term J is independent of the 
choice of S made in its evaluation and interpreted as 
the “horizon angular momentum." It has numerous 
properties that support this interpretation. In parti- 
cular, it yields the standard angular momentum 
expression in Kerr-Newman spacetimes. 

To define horizon energy, one has to introduce a 
“time-translation” vector field t°. At infinity, t^ must 
tend to a unit time translation. On A, it must be a 
symmetry of qap. Since /^ and q^ are both horizon 
symmetries, 女王 cl + Qy* on A, for some constants 
c and Q. However, unlike $^, the restriction of t^ to 
A cannot be fixed once and for all but must be 
allowed to vary from one phase-space point to 
another. In particular, on physical grounds, one 
expects Q to be zero at a phase-space point 
representing a nonrotating black hole but nonzero 
at a point representing a rotating black hole. This 
freedom in the boundary value of 1^ introduces a 
qualitatively new element. The vector field X(t) on T 
defined by the Lie derivatives of gravitational and 
matter fields does not, in general, satisfy Lx Q = 0; 
it need not be an infinitesimal canonical transforma- 
tion. The necessary and sufficient condition is that 
(«7/81 G)6as + €6]4 + $(,56QA be an exact var- 
iation. That is, X(t) generates a Hamiltonian flow if 
and only if there exists a function E on T such that 


[7] 


(et) 


ern) 
EA — 81G 


This is precisely the first law. Thus, the framework 
provides a deeper insight into the origin of the first 
law: it is the necessary and sufficient condition for 
the evolution generated by 7^ to be Hamiltonian. 
Equation [8] is a genuine restriction on the choice of 
phase-space functions c and €), that is, of restrictions 
to A of evolution fields 7^. It is easy to verify that M 
admits many such vector fields. Given one, the 
Hamiltonian H(t) generating the time evolution 
along £^ takes the form 


baa 中 (26] A 十 $(.6QA [8] 


H(t) = E® — EY [9] 
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re-enforcing the interpretation of EW as the horizon 
energy. 

In general, there is a multitude of first laws, one for 
each vector field t*, the evolution along which preserves 
the symplectic structure. In the Einstein-Maxwell 
theory, given any phase-space point, one can choose a 
canonical boundary value 7^ exploiting the uniqueness 
theorem. EP is then called the horizon mass and 
denoted simply by ma. In the Kerr-Newman family, 
H(t,) vanishes and mA coincides with the ADM mass 
Mə. Similarly, if $° is chosen to be a global rotational 
Killing field, Je equals J\”’. However, in more general 
spacetimes where there is matter field or gravitational 
radiation outside A, these equalities do not hold; ma 
and JA represent quantities associated with the 
horizon alone while the ADM quantities represent 
the total mass and angular momentum in the space- 
time, including contributions from matter fields and 
gravitational radiation in the exterior region. In the 
first law [8], only the contributions associated with 
the horizon appear. 

When the uniqueness theorem fails, as, for 
example, in the Einstein-Yang-Mills-Higgs theory, 
first laws continue to hold but the horizon mass ma 
becomes ambiguous. Interestingly, these ambiguities 
can be exploited to relate properties of hairy black 
holes with those of the corresponding solitons. (For 
a summary, see Ashtekar and Krishnan (2004).) 


Dynamical Situations 


A natural question now is whether there is an analog of 
the second law of thermodynamics. Using event 
horizons, Hawking showed that the answer is in the 
affirmative (see Hawking and Ellis (1973)). Let (M, gap) 
admit an event horizon £. Denote by /^ a geodesic null 
normal to £. Its expansion is defined as 0,5 := q^" V abp, 
where q^" is any inverse of the degenerate intrinsic 
metric qap on E, and determines the rate of change of the 
area element of € along /^. Assuming that the null energy 
condition and Einstein's equations hold, the Raychaud- 
huri equation immediately implies that if 4») were to 
become negative somewhere it would become infinite 
within a finite affine parameter. Hawking showed that, 
if there is a globally hyperbolic region containing 
I~ (9*) UE - that is, if there are no naked singularities 
— this can not happen, whence 0(f) > 0 on £. Hence, if a 
cross section $5 of € is to the future of a cross section $4, 
we must have ds, > as,. Thus, in any (Le. not 
necessarily infinitesimal) dynamical process, the change 
Aa in the horizon area is always non-negative. This 
result is known as the “second law of black hole 
mechanics." As in the first law, the analog of entropy is 
the horizon area. 


It is tempting to ask if there is a local physical 
process directly responsible for the growth of area. 
For event horizons, the answer is in the negative 
since they can grow in a flat portion of spacetime. 
However, one can introduce quasilocal horizons 
also in the dynamical situations and obtain the 
desired result (Ashtekar and Krishnan 2003). These 
constructions are strongly motivated by earlier ideas 
introduced by Hayward (1994). 


Definition 2 A three-dimensional spacelike sub- 
manifold H of (M,g,,) is said to be a “dynamical 
horizon” if it admits a foliation by compact 
2-manifolds S (without boundary) such that: 


(i) the expansion bw of one (future directed) null 
normal field “ to S vanishes and the expansion 
of the other (future directed) null normal field, 
n? is negative; and 

(ii) -T?,,@° is a future pointing causal vector on H. 


One can show that this foliation of H is unique and 
that S is either a 2-sphere or, under degenerate and 
physically over-restrictive conditions, a 2-torus. Each 
leaf S is a marginally trapped surface and referred to as a 
“cut” of H. Unlike event horizons E£, dynamical horizons 
H are locally defined and do not display any teleological 
feature. In particular, they cannot lie in a flat portion of 
spacetime. Dynamical horizons commonly arise in 
numerical simulations of evolving black holes as world 
tubes of apparent horizons. As the black hole settles 
down, H asymptotes to an isolated horizon A, which 
tightly hugs the asymptotic future portion of the event 
horizon. However, during the dynamical phase, H 
typically lies well inside £. 

The two conditions in Definition 2 immediately 
imply that the area of cuts of increases mono- 
tonically along the “outward direction" defined by 
the projection of /^ on H. Furthermore, this change 
turns out to be directly related to the flux of energy 
falling across H. Let R denote the “radius function” 
on H so that the area of any cut S is given by 
as =4rR?. Let N denote the norm of 0,R and AH, 
the portion of H bounded by two cross sections Sı 
and S2. The appropriate energy turns out to be 
associated with the vector field NÆ, where /^ is 
normalized such that its projection on H is the unit 
normal 7^ to the cuts S. In the generic and 
physically interesting case when 9 is a 2-sphere, the 
Gauss and the Codazzi (i.e., constraint) equations 
imply 


1 y "" 1 
5G (R2 Ri) = f Tae d 本 


«| N (o4,0*^ + 26067) PV [10] 
AH 


Here 7^ is the unit normal to H, o^^ the shear of le 
(i.e., the tracefree part of q^" gm p), and (*— 
gv’rV ty, where q^ is the projector onto the 
tangent space of the cuts S. The first integral on 
the right-hand side can be directly interpreted as the 
flux across AH of matter-energy (relative to the 
vector field N/^). The second term is purely 
geometric and is interpreted as the flux of energy 
carried by gravitational waves across AH. It has 
several properties which support this interpretation. 
Thus, not only does the second law of black hole 
mechanics hold for a dynamical horizon H, but the 
"cause" of the increase in the area can be directly 
traced to physical processes happening near H. 

Another natural question is whether the first law 
[8] can be generalized to fully dynamical situations, 
where 6 is replaced by a finite transition. Again, the 
answer is in the affirmative. We will outline the idea 
for the case when there are no gauge fields on H. As 
with isolated horizons, to have a well-defined notion 
of angular momentum, let us suppose that the 
intrinsic 3-metric on H admits a rotational Killing 
field p. Then, the angular momentum associated 
with any cut S is given by 


(v) a 2 — -( p) 2 
MES Kuno ?P d 一 v d V 
Js 87G S ab 87G s! [11] 


where K,p is the extrinsic curvature of H in (M, g,,) and 
j'? is interpreted as “the angular momentum density.” 
Now, in the Kerr family, the mass, surface gravity, and 
the angular velocity can be unambiguously expressed as 
well-defined functions (a, J), la, J), and Q(a, J) of the 
horizon area a and angular momentum J. The idea is to 
use these expressions to associate mass, surface gravity, 
and angular velocity with each cut of H. Then, a 
surprising result is that the difference between the 
horizon masses associated with cuts S; and S5 can be 
expressed as the integral of a locally defined flux across 
the portion AH of H bounded by Hı and H2: 


= - ] = 1 ap 2 
m2 一 而 1 = 了 NE d'V 


" 
-$ arev- | a£ revi [12] 
81 Ni S 


If the cuts S and 5, are only infinitesimally separated, 
this expression reduces precisely to the standard first 
law involving infinitesimal variations. Therefore, [12] is 
an integral generalization of the first law. 

Let us conclude with a general perspective. On the 
whole, in the passage from event horizons in 
stationary spacetimes to isolated horizons and then 
to dynamical horizons, one considers increasingly 
more realistic situations. In all the three cases, the 
analysis has been extended to allow the presence of 
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a cosmological constant A. (The only significant 
change is that the topology of cuts S of dynamical 
horizons is restricted to be S? if A » 0 and is 
completely unrestricted if A < 0.) In the first two 
frameworks, results have also been extended to higher 
dimensions. Since the notions of isolated and dynami- 
cal horizons make no reference to infinity, these 
frameworks can be used also in spatially compact 
spacetimes. The notion of an event horizon, by 
contrast, does not naturally extend to these space- 
times. On the other hand, the generalization [4] of the 
first law [3] is applicable to event horizons of 
stationary spacetimes in a wide class of theories while 
so far the isolated and dynamical horizon frameworks 
are tied to general relativity (coupled to matter 
satisfying rather weak energy conditions). From a 
mathematical physics perspective, extension to more 
general theories is an important open problem. 


See also: Asymptotic Structure and Conformal Infinity; 
Branes and Black Hole Statistical Mechanics; Dirac 
Fields in Gravitation and Nonabelian Gauge Theory; 
Geometric Flows and the Penrose Inequality; Loop 
Quantum Gravity; Minimal Submanifolds; Quantum Field 
Theory in Curved Spacetime; Quantum Geometry and its 
Applications; Random Algebraic Geometry, Attractors 
and Flux Vacua; Shock Wave Refinement of the 
Friedman-Robertson-Walker Metric; Stationary Black 
Holes. 
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Introduction 


Ludwig Boltzmann (1872) established an evolution 
equation to describe the behavior of a rarefied gas, 
starting from the mathematical model of elastic balls 
and using mechanical and statistical considerations. 
The importance of this equation is twofold. First, it 
provides a reduced description (as well as the 
hydrodynamical equations) of the microscopic 
world. Second, it is also an important tool for the 
applications, especially for dilute fluids when the 
hydrodynamical equations fail to hold. 

The starting point of the Boltzmann analysis is to 
abandon the study of the gas in terms of the detailed 
motion of molecules which constitute it because of 
their large number. Instead, it is better to investigate 
a function f(x,v), which is the probability density of 
a given particle, where x and v denote its position 
and velocity. Actually, f(x,v)dx dv is often confused 
with the fraction of molecules falling in the cell of 
the phase space of size dx dv around x, v. The two 
concepts are not exactly the same, but they are 
asymptotically equivalent (when the number of 
particles is diverging) if a law of large numbers holds. 

The Boltzmann equation is the following: 


(0, +v: Vx)f = O(f.f) [1] 


where Q, the collision operator, is defined by eqn [2]: 
OP = | dni ,dno - v) n 
x [f(x vf x14) — F(x, v)f (e vi)] [2] 


and 


v —v-—n|n- (v —vi)] 


i3] 


v, =v 4 n[n - (v — vi)] 


Moreover, n (the impact parameter) is a unitary 
vector and S? = (n|n - (v — vı) > 0]. 

Note that 7/, v, are the outgoing velocities after a 
collision of two elastic balls with incoming velocities 
v and vı and centers x and x--rz, r being the 
diameter of the spheres. Obviously, the collision 
takes place if 2- (v — v1) > 0. Equations [3] are a 
consequence of the conservation of total energy, 
momentum, and angular momentum. Note also that 
r does not enter in eqn [1] as a parameter. 


As fundamental features of eqn [1], we have the 
conservation in time of the following five quantities 


| [à Í dvf(x, v £s? e 


with a=0,1,2, expressing conservation of the 
probability, momentum, and energy. 

From now on we shall set f= fgs for notational 
simplicity. 

Moreover, Boltzmann introduced the (kinetic) 
entropy defined as 


Hif) = [ dx [ vf logfis.v) [5] 


and proved the famous H-theorem asserting the 
decreasing of H(f(t)) along the solutions to eqn [1]. 

Finally, in the case of bounded domains or 
homogeneous solutions (f =f (v; t) is independent of 
x), the distribution defined for some 8 > 0, p > 0, 
and u € R? by 


Pv m HICNEEN coppa 
v) = Faye 


called Maxwellian distribution, is stationary for the 
evolution given by eqn [1]. In addition, M minimizes 
H among all distributions with given total mass p, 
given mean velocity u, and mean energy. The 
parameter [2 is interpreted as the inverse 
temperature. 

In conclusion, Boltzmann was able to introduce 
not only an evolutionary equation with the remark- 
able properties expressing mass, momentum, and 
energy conservation, but also the trend to the 
thermal equilibrium. In other words, he tried to 
conciliate the Newton’s laws with the second 
principle of thermodynamics. 


e (8/2)lv-ul? [6] 


The Boltzmann Heuristic Argument 


Thus, we want to find an evolution equation for the 
quantity f(x,v;t). The molecular system we are 
considering consists of N identical particles of 
diameter r in the whole space R^. We denote by 
X1,U1, ... ,XN,UN à state of the system, where x; and 
v; indicate the position and the velocity of the 
particle i. The particles cannot overlap (i.e., the 
centers of two particles cannot be at a distance 
smaller than the particle diameter r). 

The particles are moving freely up to the first 
instance of contact, that is, the first time when two 
particles (say particles i and j) arrive at a distance r. 
Then the pair interacts when an elastic collision 
occurs. This means that they change instantaneously 


their velocities, according to the conservation of 
the energy and linear and angular momentum. 
More precisely, the velocities after a collision 
with incoming velocities v and vı are those given 
by formula [3]. After the first collision, the 
system evolves by iterating the procedure. Here 
we neglect triple collisions because they are 
unlikely. The evolution equation for a tagged 
particle is then of the form 


(8 +v-V;)f = Coll [7] 


where Coll denotes the variation of f due to the 
collisions. 
We have 


Coll e G-t [8] 


where L and G (the loss and gain terms, respectively) 
are the negative and positive contributions to the 
variation of f due to the collisions. More precisely, 
L dx dvdt is the probability of the test particle to 
disappear from the cell dx dv of the phase space 
because of a collision in the time interval (t,t+ dt) 
and Gdx dvdt is the probability to appear in the 
same time interval for the same reason. Let us 
consider the sphere of center x with radius r and a 
point x 十 71 over the surface, where n denotes the 
generic unit vector. Consider also the cylinder with 
base area dS=r?dn and height |V|dt along the 
direction of V =n — v. 

Then a given particle (say particle 2) with velocity 
v; can contribute to L because it can collide with the 
test particle in the time dż, provided it is localized in 
the cylinder and if V -n <0. Therefore, the contri- 
bution to L due to the particle 2 is the probability of 
finding such a particle in the cylinder (conditioned to 
the presence of the first particle in x). This quantity is 
fo(x,v,x + nr, v3) | (v — v) - n|r?^ dn dv; dt, where f; 
is the joint distribution of two particles. Integrating in 
dn and dv;, we obtain that the total contribution to 
L due to any predetermined particle is 


r J dv; | dnf(x,v,x--nr,vi)(vo —v)-n| [9] 
S2 


where S is the unit hemisphere (v5 — v). 5 <0. 
Finally, we obtain the total contribution multiplying 
by the total number of particles: 


L-(N - 9? | d 
x | dn f(x,v,x + nr,v)|(vz—v)-n| [10] 
S 
The gain term can be derived analogously by 


considering that we are looking at particles which 
have velocities v and v after the collisions so 
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that we have to integrate over the hemisphere 
S? ={(v2 — v) -n > 0}: 


G —(N — 1)? | dv 
x / dn f(x, v, x +mr,v2)\(v2—v)-n| [11] 
Summing G and —L, we get 
Coll = (N — p? | dv. 


x | dni + mv) lo —v)-n [12] 


which, however, is not a very useful expression 
because the time derivative of f is expressed in terms 
of another object, namely fı. An evolution equation 
for f; will imply fs, the joint distribution of three 
particles, and so on, up to we include the total 
particle number N. Here the basic main assumption 
of Boltzmann enters, namely that two given particles 
are uncorrelated if the gas is rarefied, namely 


f (x, v, x2, va) = f(x, v)f (x2, v2) [13] 


Condition [13], referred to as the propagation of 
chaos, seems contradictory at first sight: if two 
particles collide, correlations are created. Even though 
we could assume eqn [13] at some time, if the test 
particle collides with particle 2, such an equation 
cannot be satisfied anymore after the collision. 

Before discussing the propagation of chaos 
hypothesis, we first analyze the size of the collision 
operator. We remark that, in practical situations 
for a rarefied gas, the combination Nr? ~ 107* cm? 
(i.e., the volume occupied by the particles) is very 
small, while Nr? = O(1). This implies that G = O(1). 
Therefore, since we are dealing with a very large 
number of particles, we are tempted to perform the 
limit N — oo and 7 一 0 in such a way that 
r? — O(N^!). As a consequence, the probability that 
two tagged particles collide (which is of the order of 
the surface of a ball, i.e., O(r?)) is negligible. 
However, the probability that a given particle 
performs a collision with any one of the remaining 
N-—1 particles (which is O(Nr) - O(1)) is not 
negligible. Therefore, condition [13] is referring to 
two preselected particles (say particles 1 and 2), so 
that it is not unreasonable to conceive that it holds 
in the limiting situation in which we are working. 

However, we cannot insert [13] in [12] because 
this latter equation refers to instants before and after 
the collision and, if we know that a collision took 
place, we certainly cannot invoke eqn [13]. Hence, it 
is more convenient to assume eqn [13] in the loss 
term and work over the gain term to keep advantage 
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of the factorization property which will be assumed 
only before the collision. 

Coming back to eqn [11] for the outgoing pair 
velocities v, v2 (satisfying the condition (vz — v) - n > 0), 
we make use of the continuity property 


= f(x, v', x + nr,v}) [14] 


where the pair vv, is pre-collisional. On f 
expressed before the collision, we can reasonably 
apply condition [13] and obtain 


-Dr f às [. dn(v — v3): m 


x [f (x, v)f (x — nr. v5) 
— f (x, v)f (x + nr, v2)| [15] 


after a change 2 — —n in the gain term, using the 
notation $? for the hemisphere {n| = (v? — v) - n > 0]. 
This transforms the pair v’, v} from a pre-collisional 
to a post-collisional pair. 

Finally, in the limit N — oo, r— 0, Nr — A^, we 
find 


f(x, v,x + nr,v3) 


G—I-(N 


(0 十 了 Vx) 


SA fas [ dn(v—v2)-n 
x fæ Df (5.5) -fæ fv) [16 


The parameter A, called mean free path, represents, 
roughly speaking, the typical length a particle can 
cover without undergoing any collision. In eqns [1] 
and [2], we just chose A — 1. 

Equation [16] (or, equivalently, eqns [1] and [2]) is 
the Boltzmann equation for hard spheres. Such an 
equation has a statistical nature, and it is not 
equivalent to the Hamiltonian dynamics from which 
it has been derived. Indeed, the H-theorem shows that 
such an equation is not reversible in time as expected 
of any law of mechanics. 

This concludes the heuristic preliminary analysis of 
the Boltzmann equation. We certainly know that the 
above arguments are delicate and require a more 
rigorous and deeper analysis. If we want the Boltzmann 
equation not to be a phenomenological model, derived 
by ad boc assumptions and justified only by its 
practical relevance, but rather that it is a consequence 
of a mechanical model, we must derive it rigorously. In 
particular, the propagation of chaos should be not a 
hypothesis but the statement of a theorem. 


Beyond the Hard Spheres 


The heuristic arguments we have developed so far 
can be extended to different potentials than that of 
the hard-sphere systems. If the particles interact via 


a two-body interaction V — V(r), the resulting 
Boltzmann equation is eqn [1], with 


O(f.f) IET $ dn B(v — vin) [ffi — ffi]. [17] 


where we are using the usual shorthand notation: 


f -f(xv) fhi = f(x), 


fi = f (x, v1) 


f =f(x,v), [18] 


and B= B(v —vi;n) is a suitable function of the 
relative velocity v — vı and the impact parameter n, 
which is proportional to the cross section relative to 
the potential V. Another equivalent, sometimes 
more convenient, way, to express eqn [17] is 


Oif fis [avr | av [ av wi t) 


IF fi ff [19] 
with 
W (v, v3 |v', v4 ) 
= w(v.vi|v',vi) x 6(v+ 4 — v' — wi) 
x 6($(v? v = "Y - Q4) )) 20] 


where w is a suitable kernel. All the qualitative 
properties, such as the conservation laws and the 
H-theorem, are obviously still valid. 


Consequences 


The Boltzmann equation provoked a debate involving 
Loschmidt, Zermelo, and Poincaré, who outlined 
inconsistencies between the irreversibility of the equa- 
tion and the reversible character of the Hamiltonian 
dynamics. Boltzmann argued the statistical nature of 
his equation and his answer to the irreversibility 
paradox was that *most" of the configurations behave 
as expected by the thermodynamical laws. However, 
he did not have the probabilistic tools for formulating 
in a precise way the statements of which he had a 
precise intuition. 

Grad (1949) stated clearly the limit N — oc, 
r —^0, Nr 一 const., where N is the number of 
particles and r is the diameter of the molecules, in 
which the Boltzmann equation is expected to hold. 
This limit is usually called the Boltzmann-Grad limit 
(B-G limit in the sequel). 

The problem of a rigorous derivation of the 
Boltzmann equation was an open and challenging 
problem for a long time. Lanford (1975) showed that, 
although for a very short time, the Boltzmann equation 
can be derived starting from the mechanical model of the 
hard-sphere system. The proof has a deep content but is 
relatively simple from a technical viewpoint. 


Existence 


The mathematical study of the Boltzmann equation 
starts with the problem of proving the existence of 
the solutions. One would like to be able to show that, 
for all (or at least for a physically significant family 
of) initial distributions (which are positive and 
summable functions) with finite momentum, energy, 
and entropy, there exists a unique solution to eqn [1] 
with the same mass, momentum, and energy as of the 
initial distribution. Moreover, the entropy should 
decrease and the solution should approach the right 
Maxwellian as t — oo. The problem, in such a 
generality, is still unsolved, but several results in this 
direction have been achieved since the pioneering 
works due to Carleman (1933) for the homogeneous 
equation. Actually, there are satisfactory results for 
some special situations, such as the homogeneous 
solutions (independent of x) close to the equilibrium, 
to the vacuum, or to homogeneous data. The most 
general result we have up to now is, unfortunately, 
not constructive. This is due to Di Perna and Lions 
(1989), who showed the existence of suitable weak 
solutions to eqn [1]. However, we still do not know 
whether such solutions, which preserve mass and 
momentum, and satisfy the H-theorem, are unique 
and also preserve the energy. 


Hydrodynamics 


The derivation of hydrodynamical equations from 
the Boltzmann equation is a problem as old as the 
equation itself and, in fact, it goes back to Maxwell 
and Hilbert. Preliminary to the discussion of the 
hydrodynamic limit, we establish a few properties of 
the collision kernel. 

It is a well-known fact that the only solution to 
the equation 


O(f,f) =0 [21] 
is a local Maxwellian, namely 
f(x,v):= M(x,v) 


TLX 


where the local parameters p, pu, and T satisfy the 
relations 


n "i [23] 


| J vM = pu [24] 
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1 3 m. i 
;] " M dv = 5pT + 5pu [25] 


Moreover, the only solution to the equation 


| pmo. fav =o 26 


is any linear combination of the quantities (1, v, v7), 
called collision invariants. The last property 
obviously corresponds to the mass, momentum, 
and energy conservation. 

With this in mind, consider a change of 
variables in the Boltzmann equation [1], passing 
from microscopic to macroscopic variables, 
x— ex, t— et. Here £ is a small scale parameter 
expressing the ratio between the typical inter- 
particle distances and the typical distances over 
which the macroscopic equations are varying. 
Such a change yields 


(2 +v: Vx)fe == Olfa fe) [27] 


We need to allow the small parameter & (mean free 
path or the Knudsen number) to tend to zero. In 
order to eliminate the singularity on the right-hand 
side of [27], we multiply both sides by the collision 
invariants v^ with a=0,1,2, and obtain the five 
equations: 


J dvv?(8, +v-Vx)f- =0 [28] 


On the other hand, if f- converges to f, as € — 0, 
necessarily O(f,f)=0 and hence f — M. Therefore, 
we expect that in the limit £ 一 0, 


| dvv" (0, +v: Vx)M = 0 [29] 
Equation [29] fixes a relation among the fields p, u, T 


as functions of x and t. A standard computation gives 
us the Euler equations for compressible gas 


O,p + div(pu) = 0 [30] 

1 
ae des Spat Vin [31] 
QT + (u- V)T -3TVu —0 [32] 


where the pressure p is related to the density p and 
the temperature T by the perfect gas law 


p = pT [33] 
In order to make the above arguments rigorous, 


Hilbert (1916) developed a useful tool, called the 
Hilbert expansion, to control the limiting procedure. 
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Namely, he expressed a formal solution to eqn [27] 
in the form of a power series expansion: 


=) fie [34] 


j20 


where fo is the local Maxwellian, with the para- 
meters p, u, T satisfying the Euler equations. All the 
other coefficients f; of the developments can be 
determined by recurrence, inverting suitable opera- 
tors. However, the series is not expected to be 
convergent, so that the way to show the validity of 
the hydrodynamical limit rigorously is to truncate 
the expansion and to control the remainder. The 
first result in this direction was obtained by Caflisch 
(1980). However, this approach is based on the 
regularity of the solutions to the Euler equations, 
which is known to hold only for short times since 
shocks can be formed. How to approximate the 
shocks in terms of a kinetic description is still a 
difficult and open problem. 

Note that the hydrodynamical picture of the 
Boltzmann equation just means that we are looking 
at the solutions of this equation at a suitable 
macroscopic scale. The rarefaction hypothesis 
underlying the Boltzmann description is reflected in 
the law of perfect gas, which states that the 
particles, in the local thermal equilibrium, are free. 


Stationary Problems 


Stationary non-Maxwellian solutions to the 
Boltzmann equation should describe stationary 
nonequilibrium states exhibiting nontrivial flows. 
In spite of the physical relevance of these problems, 
not many complete mathematical results are, at the 
moment, available. Among them, there is the 
traveling-wave problem, which can be formulated 
in the following way. We look for a solution 
f 2f(x —ct,v),f:R x R^ R*, constant in form 
but traveling with a constant velocity c > 0, to 


(vi — of = O(f.f) [35] 


where v is the first component of v and f' denotes 
the spatial derivative of f. Equation [35] must.be 
complemented by the boundary conditions which 
are f —^ Ma, as x—oo, where M. are the right 
and left Maxwellians, namely two prescribed equili- 
brium situations at infinity. The parameters (density, 
mean velocity, and temperature) of the Maxwel- 
lians, however, cannot be chosen arbitrarily. Indeed, 
the conservations of the mass, momentum, and 
energy (which are properties of O) imply the 
conservations (in x) of the fluxes of these quantities. 
Hence, we have to impose five equations that relate 


the upstream and the downstream values of the 
densities, mean velocities, and temperatures. Such 
relations are known in gas dynamics as the 
Rankine-Hugoniot conditions. A solution of this 
problem has been found by Caflisch and Nikolaenko 
(1983) in case of a weak shock (namely, when M, 
and M. are close) by using Hilbert expansion 
techniques. More recently, Liu and Yu (2004) 
established also stability and positivity of this 
solution. 


Quantum Kinetic Theory 


Uehling and Uhlembeck (1933) introduced the 
following kinetic equation for describing a large 
system of weakly interacting bosons or fermions: 


(0, +v- Vx)f = [av fav janw V, Vi|v , vi) 


x (13 f)(1 E fif f 
-(ldf xf fh) [36] 


Here the 十 /一 sign, stand for bosons/fermions, 
respectively, and 


W (v. vilu, v4) 
= (V(v' —v) — V(v —vi))*é(v + -v/ —v) 
eA? + -= (1) )) 37] 
Moreover, 


=4n | dre [38] 


where V is the interaction potential. Note that eqn 
[37] is the expression of the cross section of a 
quantum scattering in the Born approximation. 

The unknown f = f (x, v; t) in eqn [37] is the expected 
number of molecules falling in the unit (quantum) cell 
of the phase space. This function is proportional to the 
one-particle Wigner function, introduced by Wigner 
(1932) to handle kinetic problems in quantum 
mechanics, and defined as (setting 5 — 1): 


a] ee *tinx-i») 


where p(x;z) is the kernel of a one-particle density 
matrix. Basically, the Wigner function is an equiva- 
lent way to describe a state of a quantum system. 
For instance, eqn [40] below expresses the equili- 
brium distributions for bosons and fermions in 
terms of Wigner functions. In general, the Wigner 
functions, due to the uncertainty principle, are real 
but not necessarily positive; however, the integral 
with respect to x and v gives the probability 


distributions of the velocity and the position, 
respectively. In the kinetic regime, in which we are 
interested, the scales are mesoscopic, namely the 
typical quantum oscillations are on a scale much 
smaller than the characteristic scales of the problem, 
so that we expect that f should be a genuine 
probability distribution, since the Heisenberg 
principle does not play an essential role. However, 
the interaction occurs on a microscopic scale, so that 
we expect that the statistics play a role in addition 
to the quantum rules for the scattering. 
In this framework, the entropy functional is 


H(f) = J dx / dv [f (x, v) log f(x, v) 
+ (1+f(x,v))log(1+f(x,v))] [39] 


It is decreasing along the solutions to eqn [35] and it is 
also minimized (among the distributions with given 
mass, momentum, and energy) by the equilibria 


z 


M(v) = ————— 
(v) e(8/2)v-P x 


[40] 
namely the Bose-Einstein and the Fermi-Dirac 
distributions, respectively. Here 7» 1 and z>0 
are the inverse temperature and the activity, respec- 
tively. Note that, for the Bose-Einstein distribution, 
z « 1. This creates, in a sense, an inconsistency with 
eqn [36]. Indeed, assuming 4-0 and an initial 
distribution f = fo(v) with the density larger than the 
maximal density allowed by eqn [40], namely 


] 
Pc = | tv [41] 


it cannot converge to any equilibrium. In order to 
overcome this difficulty related to the Bose con- 
densation, one can enlarge the definition of the 
equilibria family by setting 


1 
M(v) = e —1 十 HOU [42] 


to take care of excess of mass by means of a condensate 
component. However, it is not clear whether eqn 
[36] can actually describe the Bose condensation 
since its derivation from the Schródinger equation 
requires, just from the very beginning, the existence of 
bosonic quasifree states which can be constructed only 
if the density is moderate. Further analyses are certainly 
needed to clarify the situation. A rigorous derivation of 
the Uehling and Uhlembeck equation is, up to now, far 
from being obtained even for short times; nevertheless, 
such an equation is extensively used in the applications. 
Equation [36] concerns a weakly interacting gas of 
quantum particles. From a mathematical viewpoint, it 
is expected to be valid in the so-called weak-coupling 
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limit, which consists in scaling space and time and the 
interaction potential @ as 


$ — vep [43] 


where £7! = N!? is a parameter diverging when the 
number of particles N tends to infinity. 

We mention, incidentally, that under such a 
scaling, a classical system is described by a transport 
equation, called Fokker-Planck-Landau equation, 
with a diffusion operator in the velocity space. 

The B-G limit considered for classical particle 
systems is different from that considered here 
for weakly interacting quantum systems. It is actually 
equivalent to rescaling space and time according to 


t 一 et [44] 


leaving the interaction unscaled but, in order to 
control the total interaction, we make the density 
diverging gently as €! = N!/?, 

A quantum system under such a scaling is expected to 
be described by a Boltzmann equation [1] with the 
collision operator O computed with the full quantum 
cross section. Now we do not have any effect of the 
statistics because in this rarefaction limit these correc- 
tions disappear. On the other hand, the cross section is 
that arising from the analysis of the quantum scattering. 
Since we do not rescale the interaction, all the other 
terms in the Born expansion of the cross section play a 
role. This kind of Boltzmann equation is a good 
description of a rarefied gas in which quantum effects 
are not negligible. 


X Ex, tet, 
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Introduction 


In 1924 the Indian physicist S N Bose introduced a new 
statistical method to derive the blackbody radiation law 
in terms of a gas of light quanta (photons). His work, 
together with the contemporary de Broglie's idea of 
matter—wave duality, led A Einstein to apply the same 
statistical approach to a gas of N indistinguishable 
particles of mass m. An amazing result of his theory was 
the prediction that below some critical temperature a 
finite fraction of all the particles condense into the 
lowest-energy single-particle state. This phenomenon, 
named Bose-Einstein condensation (BEC), is a conse- 
quence of purely statistical effects. For several years, 
such a prediction received little attention, until 1938, 
when F London argued that BEC could be at the basis of 
the superfluid properties observed in liquid ^He below 
2.17 K. A strong boost to the investigation of Bose- 
Einstein condensates was given in 1995 by the observa- 
tion of BEC in dilute gases confined in magnetic traps 
and cooled down to temperatures of the order of a few 
nK. Differently from superfluid helium, these gases 
allow one to tune the relevant parameters (confining 
potential, particle density, interactions, etc.), so to make 


them an ideal test-ground for concepts and theories on 
BEC. 


- 


What Is BEC? 


In nature, particles have either integer or half- 
integer spin. Those having half-integer spin, like 
electrons, are called fermions and obey the Fermi- 
Dirac statistics; those having integer spin are 
called bosons and obey the Bose-Einstein statis- 
tics. Let us consider a system of N bosons. In 
order to introduce the concept of BEC on a 
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general ground, one can start with the definition 
of the one-body density matrix 


n (rr) = (V (r) (r^) [1] 


The quantities Vi(r) and V(r) are the field operators 
which create and annihilate a particle at point r, 
respectively; they satisfy the bosonic commutation 
relations 


[V(r), tiir) 6(r—-r), [W(r,*(r)-0 (2 
If the system is in a pure state described by the 
N-body wave function WYV(rj;,...,rN), then the 
average [1] is taken following the standard rules of 
quantum mechanics and the one-body density 
matrix can be written as 


n P (r,r) 
=N | dr dr (viris ors) ror) [3] 


involving the integration over the N —1 variables 
r2,...,rnN. In the more general case of a statistical 
mixture of pure states, expression [3] must be 
averaged according to the probability for a system 
to occupy the different states. 

Since n (r,r) —(n'"(r,r)' the quantity n”, 
when regarded as a matrix function of its indices 
r and r’, is Hermitian. It is therefore always possible 
to find a complete orthonormal basis of single- 
particle eigenfunctions, y;(r), in terms of which the 
density matrix takes the diagonal form 


n” (rr) my; (rpl) [4] 


The real eigenvalues n; are subject to the normal- 
ization condition 5; n; =N and have the meaning of 
occupation numbers of the single-particle states yj. 
BEC occurs when one of these numbers (say, no) 
becomes macroscopic, that is, when mp = No is a 
number of order N, all the others remaining of order 1. 


In this case eqn [4] can be conveniently rewritten in 
the form 


n (r,r) = Nogo(r)go(r’) + 5 mei (r)yi(r’) [5] 
iz0 

and the state represented by o(r) is called 
Bose-Einstein condensate. This definition is rather 
general, since it applies to any macroscopic (N > 1) 
system of indistinguishable bosons independently of 
mutual interactions and external fields. 

The one-body density matrix [1] contains informa- 
tion on important physical observables. By setting 
r =r’ one finds the diagonal density of the system 


n(r) =n (r,r) = (& (r)&(r)) [6 


with N= fdr n(r). The off-diagonal components 
can instead be used to calculate the momentum 
distribution 


n(p) = (& (p) (p)) [7] 
where V(p) = (22h) 2? [ drv(r) exp [- ip - r/b] is the 


field operator in momentum representation. By 
inserting this expression for (p) into eqn [7] one 


finds 


OPE nie (D (R$. RS)eips/p 
n(p) = aap | 8%” (R+5,R Sje 
i8] 

where s=r—r and R — (r - 1)/2. 

Let us consider a uniform system of N particles in 
a volume V and take the thermodynamic limit 
N,V — oo with density N/V kept fixed. The eigen- 
functions of the density matrix are plane waves and 
the lowest-energy state has zero momentum, fp — 0, 
and constant wave function yo(r) — V-!7. BEC in 
this state implies a macroscopic number of particles 
having zero momentum and constant density No/V. 
The density matrix only depends on s— r — r and 
can be written as 


No lwo 
n (s) = voz. np 
V V 


In the s — oo limit, the sum on the right vanishes due 
to destructive interference between different plane 
waves, but the first term survives. One thus finds that, 
in the presence of BEC, the one-body density matrix 
tends to a constant finite value at large distances. This 
behavior is named off-diagonal long-range order, 
since it involves the off-diagonal components of the 
density matrix. Its counterpart in momentum space is 
the appearance of a singular term at p = 0: 


n(p) = Noé(p) + X` ny&(p — p^) [10] 
pz 


e iPs/h [9] 
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The sum on the right is the number of noncondensed 
particles (N — No), and the quantity No/N is called 
condensate fraction. 

If the system is not uniform, the eigenfunctions of 
the density matrix are no longer plane waves but, 
provided N is sufficiently large, the concept of BEC 
is still well defined, being associated with the 
occurrence of a macroscopic occupation of a 
single-particle eigenfunction yo(r) of the density 
matrix. Thus, the condensed bosons can be 
described by means of the function YV(r)— 
v Noqvo(r), which is a classical complex field playing 
the role of an order parameter. This is the analog of 
the classical limit of quantum electrodynamics, 
where the electromagnetic field replaces the micro- 
scopic description of photons. The function V may 
also depend on time and can be written as 


V(r,t) = (P(r, t)| e"? [11] 


Its modulus determines the contribution of the 
condensate to the diagonal density [6], while the 
phase S is crucial in characterizing the coherence 
and superfluid properties of the system. The order 
parameter [11], also named macroscopic wave 
function or condensate wave function, is defined 
only up to a constant phase factor. One can always 
multiply this function by the numerical factor e'® 
without changing any physical property. This 
reflects the gauge symmetry exhibited by all the 
physical equations of the problem. Making an 
explicit choice for the value of the order parameter, 
and hence for the phase, corresponds to a formal 
breaking of gauge symmetry. 


BEC in Ideal Gases 


Once we have defined what is a Bose-Einstein 
condensate, the next question is when such a 
condensation occurs in a given system. The ideal 
Bose gas provides the simplest example. So, let us 
consider a gas of noninteracting bosons described 
by the Hamiltonian H=}; ĤI, where the Schró- 
dinger equation H''p;(r) —ejp;(r) gives the spec- 
trum of single-particle wave functions and 
energies. One can define an occupation number 
n; as the number of particles in the state with 
energy ej. Thus, any given state of the many-body 
system is specified by a set {n;i}. The mean 
occupation numbers, ñ; can be calculated by 
using the standard rules of statistical mechanics. 
For instance, by considering a grand canonical 
ensemble at temperature T, one finds 


n; = (exp[A(e; — 1)] — 1} [12 
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with G=1/(kgT). The chemical potential yz is fixed 
by the normalization condition 5^; 5; — N, where N 
is the average number of particles in the gas. For 
T — oo the chemical potential is negative and large. 
It increases monotonically when T is lowered. Let us 
call eo; the lowest single-particle level in the 
spectrum. If at some critical temperature Te the 
normalization condition can be satisfied with 
jie), then the occupation of the lowest state, 
ño = No, becomes of order N and BEC is realized. 
Below T, the normalization condition must be 
replaced with N= No + Nr, where NT = 2 jiz0 A; ls 
the number of particles out of the condensate, that 
is, the thermal component of the gas. Whether BEC 
occurs or not, and what is the value of T, depends 
on the dimensionality of the system and the type of 
single-particle spectrum. 

The simplest case is that of a gas confined in a 
cubic box of volume V — L? with periodic boundary 
conditions, where H'! = —(b*/2m)V2. The eigen- 
functions are plane waves oy(r)—-V- 1/2 exp |- TE 
r/b|, with energy ¢,=p?/2m and momentum 
p-—2nbn/L. Here n is a vector whose components 
Nx, ny, nz are 0 or + integers. The lowest eigenvalue 
has zero energy (co — 0) and zero momentum. The 
mean occupation numbers are given by 
Ap = {exp [8(p? /2m — u)] —1}'. In the thermo- 
dynamic limit (N, V — oo with N/V kept constant), 
one can replace the sum 57, with the integral 
f dep(e), where ple)= (27) y?y 2m/b°) 3/2 Ve is the 
density of states. In this way, one can calculate the 
thermal component of the gas as a function of T, 
finding the critical temperature 


2nh* N 23 
nd Dana 


where Ç is the Riemann zeta function and ¢(3/2) ~ 
2.612. For T > Ta one has y < 0 and Nr =N. For 
T < T, one instead has u — 0, Nr — N — No and 


[13] 


No(T) = N[1 - (T/T) [14] 

The critical temperature turns out to be fully 
determined by the density N/V and by the mass of 
the constituents. These results were first obtained 
by A Einstein in his seminal paper and used by 
F London in the context of superfluid helium. We 
notice that the replacement of the sum with an 
integral in the above derivation is justified only if 
the thermal energy kgT is much larger than the 
energy spacing between single-particle levels, that is, 
if kg T > b^ [2mV*/*. Is is also worth noticing that 
the above expression for Te can be written as 
N/V œ 2.612, where Ar =[2h*/(mkgT)|'/? is 
the thermal de  Broglie wavelength. dh is 


equivalent to saying that BEC occurs when the 
mean distance between bosons is of the order of 
their de Broglie wavelength. 

Another interesting case, which is relevant for the 
recent experiments with BEC in dilute gases con- 
fined in magnetic and/or optical traps, is that of an 
ideal gas subject to harmonic potentials. Let us 
consider, for simplicity, an isotropic external poten- 
tial Vad = (1/2)mwe, r. The Single- particle Hamil- 
tonian is 五 (= E /2m)V? + Vex(r) and its 
eigenvalues are e, 5,— (Mx + My + n; + 3/2)bwho. 
The corresponding density of states is p(e)= 
(1/2)(bw,.) ?^&. A natural thermodynamic limit for 
this system is obtained by letting N— œo and 
Who — 0, while keeping the product Nw), constant. 
The condition for BEC to occur is that u approaches 
the value e000 = (3/2)5wy, from below by cooling the 
gas down to T.. Following the same procedure as 
for the uniform gas, one finds 


ka T. = buy, [N/c(3)] ^ = 0.94buy, NI — [15] 
and 


No(T) = N[1 - (T/T.)] [16] 


Notice that the condensate is not uniform in this case, 
since it corresponds to the lowest eigenfunction of the 
harmonic oscillator, which is a Gaussian of width 
dyo— [5/ (muys)]!?. Correspondingly, the condensate 
in the momentum space is also a Gaussian, of width 
a... This implies that, differently from the gas in a box, 
here the condensate can be seen both in coordinate and 
momentum space in the form of a narrow distribution 
emerging from a wider thermal component. Finally, 
results [15] and [16] remain valid even for anisotropic 
harmonic potentials, with trapping frequencies wy, wy, 
and w,, provided the frequency who is replaced by the 
geometric average (W,WyW,) Lis 


BEC in Interacting Gases 


Actual condensates are made of interacting particles. 
The full many-body Hamiltonian is 


Ĥ = f dr! (r) Ho (r) 


+5 /dr er Vi(r)i!(r)V(r-r)b(r)b(r) [17] 
where ye —r’) is the particle-particle interaction and 
Hp = — b^ /2m)V? + Vexe(r). Differently from the 
case of ur gases, H is no longer a sum of single- 
particle Hamiltonians. However, the general defini- 
tions given in the section “What is BEC?" are still 
valid. In particular, the one-body density matrix, in the 
presence of BEC, can be separated as in eqn [5]. One 


can write n!) (r, 7’) = V*(r)W(r) + 2) (r,r), where V 
is the order parameter of the condensate (Y*(r) U(r’) 
being of order N), while z/'(r, 7") vanishes for large 
Ir — r'|. This is equivalent to say that the bosonic field 
operator splits in two parts, 


^ 


V(r) = U(r) 十 óv(r) [18] 


where the first term is a complex function and the 
second one is the field operator associated with 
the noncondensed particles. This decomposition is 
particularly useful when the depletion of the 
condensate, that is, the fraction of noncondensed 
particles, is small. This happens when the interac- 
tion is weak, but also for particles with arbitrary 
interaction, provided the gas is dilute. In this case, 
one can expand the many-body Hamiltonian by 
treating the operator 6W as a small quantity. 

A suitable strategy consists in writing the Heisen- 
berg equation for the evolution of the field opera- 


tors, ibo, V —[V,H], using the many-body 
Hamiltonian [17]: 
ibo, V(r, 
fn Vi(r,t)V EL) 
U(r, t) [19] 


The zeroth-order is thus obtained by replacing the 
operator V with the classical field V. In the integral 
containing the interaction V(r — r’), this replacement is, 
in general, a poor approximation when short distances 
(r — r’) are involved. In a dilute and cold gas, one can 
nevertheless obtain a proper expression for the inter- 
action term by observing that, in this case, only binary 
collisions at low energy are relevant and these collisions 
are characterized by a single parameter, the s-wave 
scattering length, a, independently of the details of the 
two-body potential. This allows one to replace V(r — r’) 
in H with an effective interaction V(r — r’) = gir. =r), 
where the coupling constant g is given by g = 4nh*a/m. 
The scattering length can be measured with several 
experimental techniques or calculated from the exact 
two-body potential. Using this pseudopotential and 
replacing the operator V with the complex function V in 
the Heisenberg equation of motion, one gets 


ibd, V(r, t) 


2 
-(- x + Vole) estu t) [20] 


This is known as Gross-Pitaevskii (GP) equation and 
it was first introduced in 1961. It has the form of a 
nonlinear Schrödinger equation, the nonlinearity 
coming from the mean-field term, proportional to 
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\W\*. It has been derived assuming that N is large 
while the fraction of noncondensed atoms is negli- 
gible. On the one hand, this means that quantum 
fluctuations of the field operator have to be small, 
which is true when n|a|’ < 1, where n is the particle 
density. In fact, one can show that, at T=0 the 
quantum depletion of the condensate is proportional 
to (n|a\°)'/*. On the other hand, thermal fluctuations 
have also to be negligible and this means that the 
theory is limited to temperatures much lower than 
T.. Within these limits, one can identify the total 
density with the condensate density. 

The stationary solution of eqn [20] corresponds to 
the condensate wave function in the ground state. One 
can write U(r, t) = Vo(r) exp (—ipt/b), where p is the 
chemical potential. Then the GP equation [20] becomes 


2372 
(- b H Voslo) esso Jue = pbolr) [21 


where n(r) — |Wo(r)|* is the particle density. The same 
equation can be obtained by minimizing the energy of 
the system written as a functional of the density: 


2 
En= f dr 2v vn + nVe(r) i [22] 


The first term on the right corresponds to the 
quantum kinetic energy coming from the uncertainty 
principle; it is usually named “quantum pressure" 
and vanishes for uniform systems. 

The next order in 6W gives the excited states of the 
condensate. In a uniform gas the ground-state order 
parameter, Yo, is a constant and the first-order 
expansion of H was introduced by N Bogoliubov in 
1947. In particular, he found an elegant way to 
diagonalize the Hamiltonian by using simple linear 
combinations of particle creation and annihilation 
operators. These are known as Bogoliubov's trans- 
formations and stay at the basis of the concept of 
quasiparticle, one of the most important concepts in 
quantum many-body theory. 

A generalization of Bogoliubov's approach to the 
case of nonuniform condensates is obtained by 
considering small deviations around the ground 
state in the form 


U(r, t) = e" [更 o(r) + u(r)e "^ -- v'(r)e^'] [23] 


Inserting this expression into eqn [20] and keeping 
terms linear in the complex functions z and v, one gets 
bwu(r) 


— [Ho — w+ 2g o (r))u(r) + gVo(r)v(r) [24] 


—bwv(r) 5 [Ho — u + 2g V2 (r)]v(r) + gW2(r)u(r) [25] 
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These coupled equations allow one to calculate the 
energies € = bw of the excitations. They also give the 
so-called quasiparticle amplitudes u and v, which obey 
the normalization condition 


J dr {ut (r)uj(r) — vf (r;(r)] = 65 


In a uniform gas, 4 and v are plane waves and one 
recovers the famous Bogoliubov's spectrum 


jo uis 1/2 
bw = Es CE en) [26] 


2m 


where q is the wave vector of the excitations. 
For large momenta the spectrum coincides with the 
free-particle energy b^42/2m. At low momenta, it 
instead gives the phonon dispersion w=cg, where 
c— [gn/m]'? is the Bogoliubov sound velocity. The 
transition between the two regimes occurs when the 
excitation wavelength is of the order of the healing 
lengtb, 


E= [Srna] ^ = b/(mcv2) [27] 


which is an important length scale for superfluidity. 
When the order parameter is forced to vanish at some 
point (by an impurity, a wall, etc.), the healing length 
provides the typical distance over which it recovers its 
bulk value. In a nonuniform condensate the excitations 
are no longer plane waves but, at low energy, they have 
still a phonon-like character, in the sense that they 
involve a collective motion of the condensate. 

The GP equation [20] is the starting point for an 
accurate mean-field description of BEC in dilute 
cold gases, which is rigorous at T=0 and for 
nlal «1. Static and dynamics properties of con- 
densates in different geometries can be calculated by 
solving the GP equation numerically or using 
suitable approximated methods. The inclusion of 
effects beyond mean field is a highly nontrivial and 
interesting problem. A rather extreme case is 
represented by liquid *He, which is a dense system 
where the interaction between atoms causes a large 
depletion of the condensate even at T —0 (No/N 
being less than 10%) and thus a full many-body 
treatment is required for its rigorous description. 
Nevertheless, even in this case, the general defini- 
tions of the section “What is BEC?” are still useful. 


Superfluidity and Coherence 


With the word superfluidity, one summarizes a 
complex of macroscopic phenomena occurring in 
quantum fluids under particular conditions: persis- 
tent currents, equilibrium states at rest in rotating 


vessels, viscousless motion, quantized vorticity, and 
others. These features can also be observed in BEC. 
The link between BEC and superfluidity is given by 
the phase of the order parameter [11]. To under- 
stand this point, let us consider a uniform system. If 
V(r,t) is a solution of the Heisenberg equation [19] 
with Ve = 0, then 


V'(r,t) = U(r — vt, t) exp l (m :无 一 22] [28] 


where v is a constant vector, is also a solution. This 
equation gives the Galilean transformation of 
the field operator and also applies to its condensate 
component V. At equilibrium, the ground-state 
order parameter is given by Vo — n exp (-iut/ b), 
where 7» is a constant independent of r. In a frame 
where the condensate moves with velocity v, the 
order parameter instead takes the form Wo= 
vnexp (iS), with S(r,t) — b [mv r — (mv? /2+ u)t]. 
The velocity of the condensate can thus be identified 
with the gradient of the phase S: 


v(r,t) — P vsr. t) [29] 


This definition is also valid for v varying slowly in 
space and time. The modulus of the order para- 
meter plays a minor role in this definition and it is 
not necessary to assume the gas to be dilute and 
close to T — 0. Indeed, the relation [29] between the 
velocity field and the phase of the order parameter 
also applies in the presence of large quantum 
depletion, as in superfluid ^He, and at T Z0. In 
this case, n should not be identified with the 
condensate density. Conversely, in dilute gases at 
T —0, n is the condensate density and the velocity 
[29] can be simply obtained by applying the usual 
definition of current density operator, 7, to the order 
parameter [11]. 

The velocity [29] describes a potential flow and 
corresponds to a collective motion of many particles 
occupying a single quantum state. Being equal to the 
gradient of a scalar function, it is irrotational 
(Vxv,=0) and satisfies the Onsager-Feynman 
quantization condition $v,:dl—xb/m, with k 
non-negative integer. These conditions are not 
satisfied by a classical fluid, where the hydro- 
dynamic velocity field, v(r,t) —j(r,t)/n(r,t), is the 
average over many different states and does not 
correspond to a potential flow. 

By using the definition of the phase S and velocity 
v, together with particle conservation, one can show 
that the dynamics of a condensate, as far as 
macroscopic motions are concerned, is governed by 
the hydrodynamic equations of an irrotational 


nonviscous fluid. Within the mean-field theory, this 
can be easily seen by rewriting the GP equation [20] 
in terms of the density 7 —|W|^ and the velocity 
[29]. Neglecting the quantum pressure term V? Vn 
(hence limiting the description to length scales 
larger than the healing length £), one gets 


o 
—n+V. = 
a (vn) = 0 [30] 
and 
m oy Vext + ale =  [31| 
Ot ext H 2 “= 


with the local chemical potential y(n)= gn. These 
equations have the typical structure of the dynamic 
equations of superfluids at zero temperature and can 
be viewed as the T=0 case of the more general 
Landau's two-fluid theory. 

One of the most striking evidences of superfluidity 
is the observation of quantized vortices, that is, 
vortices obeying the Onsager-Feynman quantization 
condition. A vast literature is devoted to vortices in 
superfluid helium and, more recently, vortices have 
also been produced and studied in condensates of 
ultracold gases, including nice configurations of 
many vortices in regular triangular lattices, similar 
to the Abrikosov lattices in superconductors. Other 
phenomena, such as the reduction of the moment of 
inertia, the occurrence of Josephson tunneling 
through barriers, the existence of thresholds for 
dissipative processes (Landau criterion), and others, 
are typical subjects of intense investigation. 

Another important consequence of the fact that 
BEC is described by an order parameter with a well- 
defined phase is the occurrence of coherence effects 
which, in different words, mean that condensates 
behave like matter waves. For instance, one can 
measure the phase difference between two conden- 
sates by means of interference. This can be done in 
coordinate space by confining two-condensates in 
two potential minima, a and b, at a distance d. Let 
us take d along z and assume that, at t= 0, the order 
parameter is given by the linear combination 
V(r) =W,(r) + exp (id)V,(r) with Y, and Y, real 
and without overlap. Then let us switch off the 
confining potentials so that the condensates expand 
and overlap. If the overlap occurs when the density 
is small enough to neglect interactions, the motion 
is ballistic and the phase of each condensate evolves 
as S(r,t) = mr? /(2bt), so that v—r/t. This implies 
a relative phase ó--S(x,y,z-d/2) — S(x,y,z — 
d/2) — ó + mdz/bt. The total density n —|W|^ thus 
exhibits periodic modulations along z with wave- 
length bt/md. This interference pattern has indeed 
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been observed in condensates of ultracold atoms. In 
these systems it was also possible to measure the 
coherence length, that is, the distance |r — r'| at which 
the one-body density vanishes and the phase of the 
order parameter is no more well defined. In most 
situations, the coherence length turns out to be of the 
order of, or larger than the size of the condensates. 
However, interesting situations exist when the coher- 
ence length is shorter but the system still preserves some 
features of BEC (quasicondensates). 


LI 


Final Remarks 


Bose-Einstein condensates of ultracold atoms are 
easily manipulated by changing and tuning the 
external potentials. This means, for instance, that one 
can prepare condensates in different geometries, 
including very elongated (quasi-1D) or disk-shaped 
(quasi-2D) condensates. This is conceptually impor- 
tant, since BEC in lower dimensions is not as simple as 
in three dimensions: thermal and quantum fluctua- 
tions play a crucial role, superfluidity must be properly 
re-defined, and very interesting limiting cases can be 
explored (Tonks-Girardeau regime, Luttinger liquid, 
etc.). Another possibility is to use laser beams to 
produce standing waves acting as an external periodic 
potential (optical lattice). Condensates in optical 
lattices behave as a sort of perfect crystal, whose 
properties are the analog of the dynamic and transport 
properties in solid-state physics, but with controllable 
spacing between sites, no defects and tunable lattice 
geometry. One can investigate the role of phase 
coherence in the lattice, looking, for instance, at 
Josephson effects as in a chain of junctions. By tuning 
the lattice depth one can explore the transition from a 
superfluid phase and a Mott-insulator phase, which is 
a nice example of quantum phase transition. Control- 
ling cold atoms in optical lattice can be a good starting 
point for application in quantum engineering, inter- 
ferometry, and quantum information. 

Another interesting aspect of BECs is that the key 
equation for their description in mean-field theory, 
namely the GP equation [20], is a nonlinear Schró- 
dinger equation very similar to the ones commonly 
used, for instance, in nonlinear quantum optics. This 
opens interesting perspectives in exploiting the analo- 
gies between the two fields, such as the occurrence of 
dynamical and parametric instabilities, the possibility 
to create different types of solitons, the occurrence of 
nonlinear processes like, for example, higher harmonic 
generation and mode mixing. 

A relevant part of the current research also involves 
systems made of mixtures of different gases, Bose-Bose 
or Fermi-Bose, and many activities with ultracold 
atoms now involve fermionic gases, where BEC can 
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also be realized by condensing molecules of fermionic 
pairs. An extremely active research now concerns the 
BCS-BEC crossover, which can be obtained in Fermi 
gases by tuning the scattering length (and hence the 
interaction) by means of Feshbach resonances. 

Ten years after the first observation of BEC in 
ultracold gases, it is almost impossible to summarize 
all the researches done in this field. A large amount 
of work has already been devoted to characterize the 
condensates and several new lines have been opened. 
Rather detailed review articles and books are 
already available for the interested readers. 


See also: Interacting Particle Systems and Hydrodynamic 
Equations; Quantum Phase Transitions; Quantum 
Statistical Mechanics: Overview; Renormalization: 
Statistical Mechanics and Condensed Matter; Superfluids; 
Variational Techniques for Ginzburg-Landau Energies. 
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Introduction 


In this article we discuss quantum theories which 
describe systems of nondistinguishable particles 
interacting with external fields. Such models are 
of interest also in the nonrelativistic case (in 
quantum statistical mechanics, nuclear physics, 
etc.), but the relativistic case has additional, 
interesting complications: relativistic models are 
genuine quantum field theories, that is, quantum 
theories with an infinite number of degrees of 
freedom, with nontrivial features like divergences 
and anomalies. Since interparticle interactions are 
ignored, such models can be regarded as a first 
approximation to more complicated theories, and 
they can be studied by mathematically precise 
methods. 

Models of relativistic particles in external electro- 
magnetic fields have received considerable attention 
in the physics literature, and interesting phenomena 
like the Klein paradox or particle-antiparticle pair 
creation in overcritical fields have been studied; see 
Rafelski et al. (1978) for an extensive review. We 
will not discuss these physics questions but only 
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describe some prototype examples and a general 
Hamiltonian framework which has been used in 
mathematically precise work on such models. The 
general framework for this latter work is the 
mathematical theory of Hilbert space operators 
(see, e.g., Reed and Simon (1975)), but in our 
discussion we try to avoid presupposing knowledge 
of that theory. As mentioned briefly in the end, this 
work has had close relations to various topics of 
recent interest in mathematical physics, including 
anomalies, infinite-dimensional geometry and group 
theory, conformal field theory, and noncommutative 
geometry. 

We restrict our discussion to spin-0 bosons and 
spin-1/2 fermions, and we will not discuss models 
of particles in external gravitational fields but 
only refer the interested reader to DeWitt (2003). 
We also only mention in passing that external 
field problems have also been studied using 
functional integral approaches, and mathemati- 
cally precise work on this can be found in the 
extensive literature on determinants of differential 
Operators. 


Examples 


Consider the Schrödinger equation describing a 
nonrelativistic particle of mass m and charge e 
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moving in three-dimensional space and interacting 
with an external vector and scalar potentials A and 
o, respectively, 
i) = Hy, H= E3 (CiV--eA) —eó [1] 
2m 
(we set h=c=1,0,=0/0t, and w,¢, and A can 
depend on the space and time variables x € R? and 
t€ R). This is a standard quantum-mechanical 
model, with : the one-particle wave function 
allowing for the usual probabilistic interpretation. 
One interesting generalization to the relativistic 
regime is the Klein-Gordon equation 


(id, pep —(—iV + 2A? = m? v-0 pj 


with a C-valued function wv. There is another 
important relativistic generalization, the Dirac 
equation 


[(i8, + ed) — (-CiV + eA)-a@+mh)y=0 [3 


with @=(a;,a2,a@3) and 8 Hermitian 4x4 
matrices satisfying the relations 
fai M 


Qj; + aja; = Oi, a; 3 = —GBaj, 


and a C*-valued function v» (we also write 1 for the 
identity). These two relativistic equations differ by 
the transformation properties of 7 under Lorentz 
transformations: in [2] it transforms like a scalar 
and thus describes spin-O particles, and it transforms 
like a spinor describing spin-1/2 particles in [3]. While 
these equations are natural relativistic generaliza- 
tions of the Schródinger equation, they no longer 
allow to consistently interpret i» as one-particle 
wave functions. The physical reason is that, in a 
relativistic theory, high-energy processes can create 
particle-antiparticle pairs, and this makes the 
restriction to a fixed particle number inconsistent. 
This problem can be remedied by constructing a 
many-body model allowing for an arbitrary number 
of particles and antiparticles. The requirement that 
this many-body model should have a ground state is 
an important ingredient in this construction. 

It is obviously of interest to formulate and study 
many-body models of nondistinguishable particles 
already in the nonrelativistic case. An important 
empirical fact is that such particles come in two 
kinds, bosons and fermions, distinguished by their 
exchange statistics (we ignore the interesting possi- 
bility of exotic statistics). For example, the fermion 
many-particle version of [1] for suitable ? and A is a 
useful model for electrons in a metal. An elegant 
method to go from the one- to the many-particle 
description is the formalism of second quantization: 
one promotes y to a quantum field operator with 


certain (anti-) commutator relations, and this is a 
convenient way to construct the appropriate many- 
particle Hilbert space, Hamiltonian, etc. In the 
nonrelativistic case, this formalism can be regarded 
as an elegant reformulation of a pedestrian con- 
struction of a many-body quantum-mechanical 
model, which is useful since it provides convenient 
computational tools. However, this formalism nat- 
urally generalizes to the relativistic case where the 
one-particle model no longer has an acceptable 
physical interpretation, and one finds that one can 
nevertheless give a consistent physical interpretation 
to [2] and [3] provided that v are interpreted as 
quantum field operators describing bosons and 
fermions. This particular exchange statistics of the 
relativistic particles is a special case of the spin- 
statistics theorem: integer-spin particles are bosons 
and half-integer spin particles are fermions. While 
many structural features of this formalism are 
present already in the simpler nonrelativistic models, 
the relativistic models add some nontrivial features 
typical for quantum field theories. 

In the following, we discuss a precise mathema- 
tical formulation of the quantum field theory models 
described above. We emphasize the functorial nature 
of this construction, which makes manifest that it 
also applies to other situations, for example, where 
the bosons and fermions are also coupled to a 
gravitational background, are considered in other 
spacetime dimensions than 3 + 1, etc. 


Second Quantization: 
Nonrelativistic Case 


Consider a quantum system of nondistinguishable 
particles where the quantum-mechanical descrip- 
tion of one such particle is known. In general, this 
one-particle description is given by a Hilbert space 
b and one-particle observables and transforma- 
tions which are self-adjoint and unitary operators 
on h, respectively. The most important observable 
is the Hamiltonian H. We will describe a general 
construction of the corresponding many-body 
system. 


Example As a motivating example we take the 
Hilbert space h = L?(R?) of square-integrable func- 
tions f(x), x € R?, and the Hamiltonian H in [1]. A 
specific example for a unitary operator on P is the 
gauge transformation (Uf)(x) — exp(ix(x))f(x) with 
x a smooth, real-valued function on R°. 


In this example, the corresponding wave functions 
for N identical such particles are the L?-functions 
fu (x1, ..., XN), X; € R?. It is obvious how to extend 
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one-particle observables and transformations to such 
N-particle states: for example, the N-particle Hamil- 
tonian corresponding to H in [1] is 


N 
Hu = 3 (CIV. + eA(t, x;))! —eó(t,x;) [S] 


j= 


and the N-particle gauge transformation Uy is defined 
through multiplication with [L-: exp(ix(x;)). |n 

For systems of indistinguishable particles it is 
enough to restrict to wave functions which are even 
or odd under particle exchanges, 


IR. e cig euo EN) 


= eiie uie agis ANI) l6] 


for all 1 €j « k € N, with the upper and lower 
signs corresponding to bosons and fermions, respec- 
tively (this empirical fact is usually taken as a 
postulate in nonrelativistic many-body quantum 
physics). It is convenient to define the zero-particle 
Hilbert space as C (complex numbers) and to 
introduce a Hilbert space containing states with all 
possible particle numbers: this so-called Fock space 
contains all states 


fo 
fi(x1) 
fa(x1. x2) [7] 
fa (x1, %2, x3) 


with fo € C. The definition of Hy and Uy then 
naturally extends to this Fock space; see below. 


General Construction 


The construction of Fock spaces and many-particle 
observables and transformations just outlined in a 
specific example is conceptually simple. An alter- 
native, more efficient construction method is to use 
"quantum fields," which we denote as v(x) and 
ij! (x), x € R^. They can be fully characterized by the 
following (anti-) commutator relations: 


(w(x), v (y). = (x —y), a), ply) = 0. [8] 


where [a, b].. = ab + ba, with the commutator and 
anticommutators (upper and lower signs, respec- 
tively) corresponding to the boson and fermion case, 
respectively. It is convenient to “smear” these fields 
with one-particle wave functions and define 


v(f) = | d'xf(x)w(x) 


4 [9 
vif = | Patla) 


for all f € b. Then the relations characterizing the 
field operators can be written as 


(Ff), v (g)].. = (f.g) 
[U(f ), v(g)].. = 0 [10] 
vf,g Ebh 


where 


(f. = | Fest 


is the inner product in h. The Fock space F+(h) can 
then be defined by postulating that it contains a 
normalized vector 2 called “vacuum” such that 


vf)i-0 vfeb [11] 


and that all Y (f) are operators on 7. (b) such that 
wi(f)=wv(f)*, where * is the Hilbert space adjoint. 
Indeed, from this we conclude that 7 (5), as vector 
space, is generated by 


fi ^fa ^ A Sm VQ) vw 12 


with f; € h and N —0,1,2,..., and that the Hilbert 
space inner product of such vectors is 


(fh ^fo A-+- A fn, 81 A 82 ^: ^ gM) 


N 
= NM $ (+1)" [0 ep) [13] 
PESN j=1 


with Sy the permutation group, with (4-1)" — 1 
always, and (—1)"  — +1 and — for even and odd 
permutations, respectively. The many-body Hamil- 
tonian q(H) corresponding to the one-particle Hamil- 
tonian H can now be defined by the following relations: 


q(H)2=0, [|gq(H),v(f)) - v(Hf) [14] 
for all f € such that Hf is defined. Indeed, this 
implies that 


q(H)fi ^ fa A+++ ^ fw 


N 
-Y AAR Ne (Hf) Ne fo [15] 
j=l 


which defines a self-adjoint operator on F+(h), and 
it is easy to check that this coincides with our down- 
to-earth definition of Hy above. Similarly, the 
many-body transformation O(U) corresponding to 
a one-particle transformation U can be defined as 


Q(U)O—O, O(U)W'(f)=Y'(UfF)Q(U) [16] 
for all f € h, which implies that 


Q(U)fi ^ fa = Afi 


= (Ufi) ^ (Uf2) A- -- ^ (Ufy) ins 


and thus coincides with our previous definition of 
UN. 

While we presented the construction above for a 
particular example, it is important to note that it 
actually does not make reference to what the one- 
particle formalism actually is. For example, if we 
had a model of particles on a space M given by 
some “nice” manifold of any dimension and with M 
internal degrees of freedom, we would take 
b — L?(Mt) & CM and replace [9] by 


M 
Vf) = f am du(x 18] 


and its Hermitian conjugate, with the measure ju on 
M defining the inner product in 5, 


fe) = | dut) ERs) 


With that, all formulas after [9] hold true as they stand. 
Given any one-particle Hilbert space b with inner 
product (- , -), observable H, and transformation U, the 
formulas above define the corresponding Fock spaces 
F «(b) and many-body observable q(H) and transfor- 
mation Q(U). It is also interesting to note that this 
construction has various beautiful general (functorial) 
properties: the set of one-particle observables has a 
natural Lie algebra structure with the Lie bracket given 
by the commutator (strictly speaking: i times the 
commutator, but we drop the common factor 1 for 
simplicity). The definitions above imply that 


(a (A), q(B)] = q([A. B]) [19] 


for one-particle observables A, B, that is, the above- 
mentioned Lie algebra structure is preserved under 
this map q. In a similar manner, the set of one- 
particle transformations has a natural group struc- 
ture preserved by the map O, 


Q(U)Q(V)=Q(UV), Q(U)'-Q(U') [20] 


Moreover, if A is self-adjoint, 
unitary, and one can show that 


O(exp(iA)) — exp(iq(A)) [21] 


For later use, we note that, if [f,],-z is some 
complete, orthonormal basis in 5, then operators A 
on h can be represented by infinite matrices 
(Aymn)m. nez. with Bn = (fits Afn)s and 


ES EN Amn I a [22] 


m,n 


) =p (fn) obey 
[Um; Ph] = fmn, 


then exp(iA) is 


where yl! 


Ion; TA x 0 [23] 
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for all m, n. We also note that, in our definition of 
g(A), we made a convenient choice of normal- 
ization, but there is no physical reason to not choose 
a different normalization and define 


q (A) = q(A) — b(A) [24] 


where b is some linear function mapping self-adjoint 
operators A to real numbers. For example, one may wish 
to use another reference vector 2) instead of €) in the 
Fock space, and then would choose b(A) = (Q, q(A)Q). 
Then the relations in [19] are changed to 


iq (A), 4'(B)) = q' (IA, B) + S(A,B) — 25] 


where So(A, B) = b([A, B]). However, the C-number 
term So(A, B) in the relations [25] is trivial, since it 
can be removed by going back to q(A). 


Physical Interpretation 


The Fock space ¥--(h) is the direct sum of subspaces 
of states with different particle numbers N, 


F(b) =D 26] 
N=0 


where the zero-particle subspace 5! =C is gener- 
ated by the vacuum 2, and 5 is the oper 
subspace generated by the states fi Af2A---A 
fn, f; € b. We note that 


N = q(1) [27] 


is the *particle-number operator," NFn = NFN for 
all Fy € bY). The field operators obviously change 
the particle number: vw l, increases the particle 
number by one (maps po to 5 ely, and wf) 
decreases it by one. Since m € hcan be interpreted 
as one-particle state, it is natural to interpret 7! (f) and 
wf) as “creation” and “annihilation” operators, 
respectively: they create and annihilate one particle in 
the state f € b. It is important to note that, in the 
fermion case, [10] implies that v! (f )? 2 0, which is a 
mathematical formulation of the Pauli exclusion 
principle: it is not possible to bave two fermions in the 
same one-particle state. In the boson case, there is no 
such restriction. Thus, even though the formalisms 
used to describe boson and fermion systems look very 
similar, they describe dramatically different physics. 


Applications 


In our example, the many-body Hamiltonian 
Ho = q(H) can also be written in the following 
suggestive form: 


Ho = J dx yi(x) (Hv) (x) 28] 
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and similar formulas hold true for other observables 
and other Hilbert spaces h=L7(M)@C". It is 
rather easy to solve the model defined by such 
Hamiltonian: all necessary computations can be 
reduced to one-particle computations. For example, 
in the static case, where A and @ are time 
independent, a main quantity of interest in statistical 
physics is the free energy 


E = —58 'log(tr(exp(-8[Mo —4N])) [29] 


where 97 0 is the inverse temperature, js the 
chemical potential, and the trace over the Fock 
space F=(b). One can show that 


E = xtr(8 log(1 exp(—BIH —,])) [30] 


where the trace is over the one-particle Hilbert space 
b. Thus, to compute £, one only needs to find the 
eigenvalues of H. 

It is important to mention that the framework 
discussed here is not only for external field 
problems but can be equally well used to for- 
mulate and study more complicated models with 
interparticle interactions. For example, while the 
model with the Hamiltonian Ho above is often too 
simple to describe systems in nature, it is easy to 
write down more realistic models, for example, the 
Hamiltonian 


H =Ho + (e^ /2) f d'x J d'y at (xy (y) 
x |æ — y| v(y)u(x) [31] 


describes electrons in an external electromagnetic 
field interacting through Coulomb interactions. This 
illustrates an important point which we would like 
to stress: the task in quantum theory is twofold, 
namely to formulate and to solve (exact of other- 
wise) models. Obviously, in the nonrelativistic case, 
it is equally simple to formulate many-body models 
with and without interparticle interactions, and only 
the latter are simpler because they are easier to 
solve: the two tasks of formulating and solving 
models can be clearly separated. As we will see, in 
the relativistic case, even the formulation of an 
external field problem is nontrivial, and one finds 
that one cannot formulate the model without at 
least partially solving it. This is a common feature of 
quantum field theories making them challenging and 
interesting. 


Relativistic Fermion and Boson Systems 


We now generalize the formalism developed in the 
previous section to the relativistic case. 


Field Algebras and Quasifree Representations 


In the previous section, we identified the field 
operators wl)(f) with particular Fock space opera- 
tors. This is analogous to identifying the operators 
pj = —id,, and q; =x; on L*(RM) with the generators 
of the Heisenberg algebra, as usually done. (We 
recall: the Heisenberg algebra is the star algebra 
generated by P; and Qj, j=1,2,...,M < oo, with 
the well-known relations 


IP... Pel = 一 过 天 
PAP. 


[P;, P4] = 
OQ! = Oj 


for all j, k.) Identifying the Heisenberg algebra with 
a particular representation is legitimate since, as is 
well known, all its irreducible representations are 
(essentially) the same (this statement is made precise 
by a celebrated theorem due to von Neumann). 

However, in case of the algebra generated by the 
field operators wv? (f), there exist representations 
which are truly different from the ones discussed in 
the last section, and such representations are needed 
to construct relativistic external field problems. It is 
therefore important to distinguish the fields as 
generators of an algebra from the operators repre- 
senting them. We thus define the (boson or fermion) 
field algebra A+(h) over a Hilbert space hb as the star 
algebra generated by V'(f),f € b, such that the map 
f — W(f) is linear and the relations 


[P;, Qk] = 0 32) 


[w (f), V (g)].. = ^ g) 
[V(f), w(g)].. = [33] 
xad 


are fulfilled for all f, geb, with +1 the star 
operation in .A«(5). The particular representation 
of this algebra discussed in the last section will be 
denoted by zo, «o (V P (f)) — (f). Other represen- 
tations vp. can be constructed from any projection 
operators P_ on h, that is, any operator P_ on h 
satisfying P* — P? —P . Writing ý (f) short for 
np (WU (f)), this so-called quasifree representation 


is defined by 
Wf) = v (Pif) + v(P-f) 
(f) = v(P.f) x (P. f) 
where the bar means complex conjugation. It is 
important to note that, while the star operation is 


identical with the Hilbert space adjoint * in the 
fermion case, we have 


DA = v(Ff)' 
F=P., — P_ 


[34] 


with i35] 


for bosons 


where F is a grading operator, that is, F* — F and F? — 1. 
We stress that the “physical” star operation always is x, 
that is, physical observables A obey A — A*. 

The present framework suggests to regard quantiza- 
tion as the procedure which amounts to going from a 
one-particle Hilbert space h to the corresponding field 
algebra A,(h). Indeed, the Heisenberg algebra is 
identical with the boson field algebra A. (CM) (since 
the latter is obviously identical with the algebra of M 
harmonic oscillators), and thus conventional quantum 
mechanics can be regarded as boson quantization in the 
special case where the one-particle Hilbert space is 
finite dimensional. It is interesting to note that 
“fermion quantum mechanics" .A (CM) is the natural 
framework for formulating and studying lattice fer- 
mion and spin systems which play an important role in 
condensed matter physics. 

In the following, we elaborate the naive inter- 
pretations of the relativistic equations in [2] and [3] 
as a quantum theory of one particle, and we discuss 
why they are unphysical. For simplicity, we assume 
that the electromagnetic fields ¢, A are time inde- 
pendent. We then show that quasifree representa- 
tions as discussed above can provide physically 
acceptable many-particle theories. We first consider 
the Dirac case, which is somewhat simpler. 


Fermions 


One-particle formalism Recalling that id, is the 
energy operator, we define the Dirac Hamiltonian D 
by rewriting [3] in the following form: 


id) = Dy, D=(-iV +eA)-a+mB—ed [36] 


This Dirac Hamiltonian is obviously a self-adjoint 
operator on the one-particle Hilbert space h = L?(R*) 四 
C^, but, different from the Schródinger Hamiltonian in 
[1], it is not bounded from below: for any Ey > —oo, 
one can find a state f such that the energy expectation 
value (f, Df) is less than Eo. This can be easily seen for 
the simplest case where the external potential vanishes, 
A — ó — 0. Then the eigenvalues of D can be computed 
by Fourier transformation, and one finds 


E=+4+,/p?+m?, peR? [37] 


Due to the negative energy eigenvalues we conclude 
that there is no ground state, and the Dirac 
Hamiltonian thus describes an unstable system, 
which is physically meaningless. 

To summarize: a (unphysical) one-particle 
description of relativistic fermions is given by a 
Hilbert space h together with a self-adjoint Hamil- 
tonian D unbounded from below. Other observables 
and transformations are given by self-adjoint and 
unitary operators on /, respectively. 
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Many-body formalism We now explain how to 
construct a physical many-body description from these 
data. To simplify notation, we first assume that D has a 
purely discrete spectrum (which can be achieved by 
using a compact space). We can then label the eigen- 
functions f„ by integers n such that the corresponding 
eigenvalues E, > 0 for n > 0 and E, <0 for n « 0. 
Using the naive representation of the fermion field 
algebra discussed in the last section, we get (we use the 
notation introduced in [22]) 


q(D) = $, |E, |j! Yn 3 x» [Ent pn [38] 


n>0 n<0 


which is obviously not bounded from below and thus 
not physically meaningful. However, ly = 1 — v, 
which suggests that we can remedy this problem by 
interchanging the creation and annihilation operators 
for n < 0. This is possible: it is easy to see that 


加 三 Vn20 and d,2w! Yn<0 [39] 


provides a representation of the algebra in [23]. We 
thus define 


q(D) = X En: JI, : [40] 
ncz 
with the so-called normal ordering prescription 
Dn cm s. — (0, s Q) [41] 


where we made use of the freedom of normalization 
explained after [23] to eliminate unwanted additive 
constants. We get q(D) — „ez (Enli Yn, which is 
manifestly a non-negative self-adjoint operator with 
Q as ground state. We thus found a physical many- 
body description for our model. We can now define 
for other one-particle observables, 


q(A) = » Amn : W by : [42] 
nez, 


and, by straightforward computations, we obtain 
iq (A), q(B)| = 3 ([A, B]) + S(A, B) [43] 


where S(A, B) = | m $0 F.M - un Bonum) 


that is, 
S(A, B) = tr(P AP, BP_ — P BP, AP ) [44] 


with P. = ,ofn(lfn,:) the projection onto the 
subspace spanned by the negative energy eigenvec- 
tors of D and P, —1 — P_. One can show that q(A) 
is no longer defined for all operators but only if 


P AP, and P,AP_ are 
Hilbert-Schmidt operators [45] 


(we recall that a is a Hilbert-Schmidt operator if 
tr(a*a) < oo). The C-number term S(A,B) in [43] is 
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often called Schwinger term and, different from the 
similar term in [25], it is now nontrivial, that is, it is 
no longer possible to remove it by a redefinition 
q'(A)=q(A) —b(A). This Schwinger term is an 
example of an anomaly, and it has various interest- 
ing implications. 

In a similar manner, one can construct the many- 
body transformations Q(U) of unitary operators U 
on Pb satisfying the very Hilbert-Schmidt condition 
in [45], and one obtains 


Q(U)Ó(V) = x(U, V)Q(UV) [46] 


with interesting phase-valued functions x. 

More generally, for any one-particle Hilbert 
space h and Dirac Hamiltonian D, the physical 
representation is given by the quasifree representa- 
tion mp in [34] with P. the projection onto the 
negative energy subspace of D. The results about å 
and Ọ mentioned hold true in any such 
representation. 

Thus the one-particle Hamiltonian D determines 
which representation one has to use, and one 
therefore cannot construct the “physical” represen- 
tation without specific information about D. How- 
ever, not all these representations are truly different: 
if there is a unitary operator U on the Fock space 


F (b) such that 
Urpo VOU = maw) A7 


for all f € b, then the quasifree representations 
associated with the different projections P'! and 
P?) are physically equivalent: one could equally well 
formulate the second model using the representation 
of the first. Two such quasifree representations are 
called unitarily equivalent, and a fundamental 
theorem due to Shale and Stinespring states that 
two quasifree representations Tpu,» are unitarily 
equivalent if and only if P — P? is a Hilbert- 
Schmidt operator (a similar result holds true in the 
boson case). 


Bosons 


One-particle formalism Similarly as for the Dirac 
case, the solutions of the Klein-Gordon equation in 
[2] also do not define a physically acceptable one- 
particle quantum theory with a ground state: the 
energy eigenvalues in [37] for A=@=O are a 
consequence the relativistic invariance and thus 
equally true for the Klein-Gordon case. However, 
in this case there is a further problem. To find the 
one-particle Hamiltonian, one can rewrite the 
second-order equation in [2] as a system of first- 
order equations, 


5= |( K=( ‘i 4 [48] 
al -iB^ C 
with 


B? = (-iV + eAY +m’, C = —eó [49] 


Thus, one sees that the natural one-particle Hilbert 
space for the  Klein-Gordon equation is 
b—L^(R?)& C?; here, and in the following, we 
identify h with bo Bho, bo — L^(R?) and use a 
convenient 2 x 2 matrix notation naturally asso- 
ciated with that splitting. However, the one-particle 
Hamiltonian is not self-adjoint but rather obeys 


K=JK), 1=(9 9) 50] 


with * the Hilbert space adjoint. It is important to 
note that / is a grading operator. Thus, we can 
define a sesquilinear form 


(f.g 5 (f.Jg Vf.geb [51] 


with (-,-) the standard inner product, and [50] is 
equivalent to K being self-adjoint with respect to 
this sesquilinear form; in this case, we say that K is 
]-self-adjoint. Thus, in the Klein-Gordon case, this 
sesquilinear form takes the role of the Hilbert space 
inner product and, in particular, not (®,®) but (,9), is 
preserved under time evolution. However, different 
from İP, PİJ is not positive definite, and it is 
therefore not possible to interpret it as probability 
density as in conventional quantum mechanics. For 
consistency, one has to require that one-particle 
transformations U are unitary with respect to (®,®) p 
that is, U^! = JUJ. We call such operators J-unitary. 

To summarize: a (unphysical) one-particle 
description of relativistic bosons is given by a 
Hilbert space of the form 5 — bo & bo, the grading 
operator / in [50], and a J-self-adjoint Hamiltonian 
K of the form as in eqn [48], where B > 0 and C are 
self-adjoint operators on ho. Other observables and 
transformations are given by J-self-adjoint and 
]-unitary operators on P, respectively. 


Many-body formalism We first consider the quasi- 
free representation mpo of the boson field algebra 
A (b) so that the grading operator in [35] is 
equal to J, that is, P! —(1-—])/2. Writing 
7 pio (VP (f)) = y? (f), one finds that 


q(A)Y-—4UA]),  Q(U)y-OQU']) [52] 


and thus J/-self-adjoint operators and /-unitary 
operators are mapped to proper observables and 
transformations. In particular, g(K) is a self-adjoint 


operator, which resolves one problem of the one-particle 
theory. However, q(K) is not bounded from below, and 
thus np) is not yet the physical representation. 

The physical representation can be constructed 
using the operators 


1 (Bi ip 1 0 
T= 万 (Be ye ) F = t A] [53] 
(for simplicity, we restrict ourselves to the case C — 0 


and B > 0; we use the calculus of self-adjoint operators 
here) with the following remarkable properties: 


WELT ad 
B 0 à 54 
{kT = ( ) =K 04 
0 -B 


One can check that 


VAST,  f)eswT-"f) [55] 


is a quasifree representation zp of .A (b) with 
P_ — (1 — F)/2. With that the construction of 9 and 
O is very similar to the fermion case described 
above (the crucial simplification is that K and F now 
are diagonal). In particular, g(K) is a non-negative 
operator with the ground state Q, and 9(4) and 
Q(U) are self-adjoint and unitary for every one- 
particle observable A and transformation U, respec- 
tively. One also gets relations as in [43] and [46]. 


Related Topics of Recent Interest 


The impossibility to construct relativistic quantum- 
mechanical models played an important role in the 
early history of quantum field theory, as beautifully 
discussed in chapter 1 of Weinberg (1995). 

The abstract formalism of quasifree representations 
of fermion and boson field algebras was developed in 
many papers (see, e.g., Ruijsenaars (1977), Grosse and 
Langmann (1992), and Langmann (1994) for explicit 
results on O and x). A nice textbook presentation 
with many references can be found in chapter 13 of 
Gracia-Bondía et al. (2001) (this chapter is rather self- 
contained but mainly restricted to the fermion case). 

Based on the Shale-Stinespring theorem, there has 
been considerable amount of work to investigate 
whether the quasifree representations associated 
with different external electromagnetic fields 
pı, Ay and v», A» are unitarily equivalent, if and 
which time-dependent many-body Hamiltonians 
exist, etc. (see chapter 13 of Gracia-Bondia et al. 
(2001), and references therein). 

The infinite-dimensional Lie algebra g) of Hilbert 
space operators satisfying the condition in [45] is an 
interesting infinite-dimensional Lie algebra with a 
beautiful representation theory. This subject is closely 
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related to conformal field theory (see, e.g., Kac and 
Raina (1987) for a textbook presentation and Carey 
and Ruijsenaars (1987) for a detailed mathematical 
account within the framework described by us). 

It turns out that the mathematical framework 
discussed. in the previous section is sufficient for 
constructing fully interacting quantum field theories, 
in particular Yang-Mills gauge theories, in 1 十 1 
but not in higher dimensions. The reason is that, in 
3 十 1 dimensions, the one-particle observables A of 
interest do not obey the Hilbert-Schmidt condition 
in [45] but only the weaker condition 


tt(a" à)" < 00, R= PAP: [56] 


with z—2, and the natural analog of g2 in 3 十 1 
dimensions thus seems to be the Lie algebra g>, of 
operators satisfying this condition with n — 2. Various 
results on the representation theory of such Lie 
algebras g5,.» have been developed (see Mickelsson 
(1989), where various interesting relations to infinite- 
dimensional geometry are also discussed). 

As mentioned, the Schwinger term S(A,B) in [44] is 
an example of an anomaly. Mathematically, it is a 
nontrivial 2-cocycle of the Lie algebra g2, and analogs 
for the groups 25,.» have been found. These cocycles 
provide a natural generalization of anomalies (in the 
meaning of particle physics) to operator algebras. They 
not only shed some interesting light on the latter, but 
also provide a link to notions and results from 
noncommutative geometry (see, e.g., Gracia-Bondía 
et al. (2001)). We believe that this link can provide a 
fruitful driving force and inspiration to find ways to 
deepen our understanding of quantum Yang-Mills 
theories in 3 4- 1 dimensions (Langmann 1996). 


See also: Anomalies; C*-Algebras and Their 
Classification; Dirac Fields in Gravitation and Nonabelian 
Gauge Theory; Dirac Operator and Dirac Field; Gerbes in 
Quantum Field Theory; Quantum Field Theory in Curved 
Spacetime; Quantum n-Body Problem; Superfluids; 
Two-Dimensional Models. 


Further Reading 


Carey AL and Ruijsenaars SNM (1987) On fermion gauge 
groups, current algebras and Kac-Moody algebras. Acta 
Applicandae Matbematicae 10: 1—86. 

DeWitt B (2003) The Global Approach to Quantum Field 
Theory, International Series of Monographs on Physics, vols. 
1 and 2, p. 114. New York: Oxford University Press. 

Gracia-Bondia JM, Várilly JC, and Figueroa H (2001) Elements 
of Noncommutative Geometry, Birkhauser Advanced Texts: 
Basel Textbooks. Boston: Birkhauser. 

Grosse H and Langmann E (1992) A superversion of quasifree second 
quantization. Journal of Mathematical Physics 33: 1032-1046. 

Kac VG and Raina AK (1987) Bombay Lectures on Highest 
Weight Representations of Infinite-Dimensional Lie Algebras, 


326 Boundaries for Spacetimes 


Advanced Series in Mathematical Physics, vol. 2. Teaneck: 
World Scientific Publishing. 

Langmann E (1994) Cocycles for boson and fermion Bogoliubov 
transformations. Journal of Mathematical Physics 96-112. 
Langmann E (1996) Quantum gauge theories and noncommuta- 

tive geometry. Acta Physica Polonica B 27: 2477-2496. 
Mickelsson J (1989) Current Algebras and Groups, Plenum 
Monographs in Nonlinear Physics. New York: Plenum Press. 
Rafelski J, Fulcher LP, and Klein A (1978) Fermions and bosons 
interacting with arbitrary strong external fields. Physics 
Reports 38: 227-361. 


1 Boundaries for Spacetimes 
| S G Harris, St. Louis University, St. Louis, MO, USA 
* © 2006 Elsevier Ltd. All rights reserved. 


Introduction 


There is a common practice in mathematics of placing a 
boundary on an object which may not appear to come 
naturally equipped with one; this is often thought of as 
adding ideal points to the object. Perhaps the most 
famous example is the addition of a single *point at 
infinity" to the complex plane, resulting in the Riemann 
sphere: this is a boundary point in the sense of providing 
an ideal endpoint for lines and other endless curves in 
the plane. Often, there is more than one reasonable way 
to construct a boundary for a given object, depending 
on the intent; for instance, the plane is sometimes 
equipped, not with a single point at infinity, but with a 
circle at infinity, resulting in a space homeomorphic to a 
closed disk. Both these boundaries on the plane have 
useful but different things to tell us about the nature of 
the plane; the common feature is that, by bringing the 
infinite reach of the plane within the confines of a more 
finite object, we are better able to grasp the behavior of 
the original object. 

The general usefulness of the construction of 
boundaries for an object is to allow behavior of 
structures in the “completed” object to aid in 
visualization of behavior in the original object, 
such as by providing a degree of measurement or 
other classification of processes at infinity. This 
utility has not been overlooked for spacetimes. A 
variety of purposes may be served by various 
boundary construction methods: providing a locale 
for singularities (as the spacetime itself is modeled 
by a smooth manifold with a smooth metric, free of 
singular points); providing a platform from which to 
measure global properties such as total energy or 
angular momentum; displaying in finite form the 
causal structure at infinity; or providing a compact 
(or quasicompact) topological envelope for the 
spacetime while preserving the causal structure. 


Reed M and Simon B (1975) Metbods of Modern Matbematical 
Physics. II. Fourier Analysis, Self-Adjointness. New York: 
Academic Press. 

Ruijsenaars SNM (1977) On Bogoliubov transformations for 
systems of relativistic charged particles. Journal of Mathema- 
tical Physics 18: 517—526. 

Weinberg S (1995) The Quantum Theory of Fields, vol. 1 (English 
summary) Foundations. Cambridge: Cambridge University Press. 


This article will consider several of the methods 
that have been used or proposed for constructing 
boundaries for spacetimes, ranging from the ad boc 
(but practical) to the universal. Perhaps the 
simplest way to classify these methods is into 
those which employ or analyze embeddings of the 
spacetime in question and those that do not. 


Boundaries from Embeddings 
General 


The simplest and most common method of construct- 
ing a boundary for a spacetime M is to find a suitable 
manifold N (of the same dimension) and an appro- 
priate map $: M — N which is a topological embed- 
ding, that is, a homeomorphism onto its image $(M). 
We can consider M,, the closure of 9(M) in N, as the 
o-completion of M, and 9,(M) — M, — ¢(M) as the 
Q-boundary. Typically, this embedding is chosen in 
such a way that curves of interest in M — such as 
timelike or null geodesics or causal curves of bounded 
acceleration — which have no endpoints in M, do have 
endpoints in (M); in other words, if c: [0, o0) 一 M is 
such a curve of interest, then lim, ,~ 6(c(t)) exists in N. 

The common practice, initiated by Penrose in 
1967, is to choose N to be another spacetime — 
often called the unphysical spacetime, while M is 
considered the spacetime of physical interest — and to 
require the embedding ¢ to be a conformal mapping, 
that is, carries the spacetime metric in M to a scalar 
multiple of the spacetime metric in N. As conformal 
maps preserve the local causal structure, leaving 
unchanged the notions of timelike curve or null 
curve, this means that M, inherits from N a causal 
structure which, locally, is an extension of that of M. 
This allows us to speak of causal relationships within 
Ma, closely related to those in M. 


Minkowski Space 


The prototypical example is the conformal embedding 
of Minkowski space into the Einstein static spacetime. 
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Let R" denote Euclidean n-space, S" the unit 
n-sphere, and L” Minkowski n-space, that is, R” with 
metric ds*=dxt+---+dx2_,—-dt? (so L”= 
R”! x L!). The n-dimensional Einstein static space- 
time is the product spacetime E" — S"! x L'. Con- 
sider S"! as embedded in R” — R"-! x R!. Then the 
conformal embedding is ó:L" — E", expressed as 
sR xs xL Rx R! xL given 
by ó(x,t)— ((x/|x|) sin 0, cos 0,7), where 0— tan"! 
(t--|x|) - tan! (t—|x|) and r= tan (t+ |x|) 4 
tan! (t — |x|). The boundary 9,(L") consists of the 
following: the points (0 + 7 —7;0 < T € 1], composed 
of an S"? of null lines coming together at the point 
i* —(0,1,7); a similar cone of null lines (0 — 7— 7; 
=r € T < 0] with vertex ati = (0,1, —7); anda single 
limit-point for both cones at 7? = (0, — 1,0). The r > 0 
null cone is called S* (the letter is read “scri” for 
“script-I”), its counterpart S (Figures 1 and 2). As all 
future-directed timelike geodesics in L” have ;^ as an 
endpoint in E”,i is called future-timelike infinity; 
similarly, ;^ is past-timelike infinity. Every future- 
directed null geodesic ends up on S^, which is thus 


Image of 1.2 
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Figure 1 1? conformally embedded in E? — S! x L’. 


termed future-null infinity, and ` is past-null infinity. 
All spacelike geodesics come to 1°, spacelike infinity. 
For n=2, this picture produces the familiar 
diamond representation of L (Figure 3): as E? is 
easily unrolled into another copy of L? (metric 
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Figure 2 1? conformally embedded in E? = S? x L'. 
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Unrolled E? 


Image of L? 


E 
x 


Figure 3 1? conformally embedded in unrolled E?, i.e., 
R! x L! - 12. 


dé? — d7?), this means that (LF) is the region I9 + 
|r| <m in LŽ; timelike curves and null geodesics in 
the original L^ are the same as in Q(L2), and their 
endpoints in the boundary of the diamond are 
evident. For higher dimensions, the picture is not as 
visually obvious, since E" cannot be unrolled; but the 
principle of reading the causal structure at infinity of 
L” via its boundary points in E" remains the same. 


Conformal Embeddings 


There have been various formulations designed to 
emulate the conformal mapping of L” with respect to 
spacetimes, which are, in some sense, asymptotically 
like Minkowski space being conformally mapped into 
larger spacetimes. A spacetime M with metric g is 
called asymptotically simple or (alternatively) asymp- 
totically flat if there is a spacetime N with metric 5, 
an embedding @:M — N, and a scalar function 2 
defined on N with ¢*h=(NQod)*g (ie, d$ is 
conformal with Q? the conformal factor) and Q0 — 0 
on (M), dQ Z0 on (M), and various other 
restrictions on (€), depending on the intent. One can 
define asymptotic symmetries of M by means of 
motions within Q,(M), leading to notions of global 
energy and angular momentum (see Hawking and 
Ellis (1973) and Wald (1984) for details). 


Classifications of Embeddings 


As a general rule, there is no uniqueness in the 
choice of an embedding ó for a spacetime M to 
construct a boundary, nor in the topology of the 
resulting boundary 9;(M), or even of which curves 
of interest end up having endpoints in the boundary. 
In an attempt to categorize which embeddings yield 
equivalent results and what sort of results there are 
in terms of endpoints of curves, Scott and Szekeres 


(1994) formulated what they called the abstract 
boundary of a spacetime. This depends on a choice 
of class of "interesting" curves, each characterizable 
as having either infinite or finite parameter length; 
typical choices for this class would be timelike 
geodesics or causal geodesics or timelike curves of 
bounded acceleration. For instance, a boundary 
point may be said to represent a singularity with 
respect to the chosen class of curves if it is the 
endpoint of one such curve with finite parameter 
length; nonsingular points are points at infinity. 
These classifications do not require conformal 
embeddings, nor even that the target of the embed- 
dings be spacetimes; they accommodate boundaries 
of a far more general. type than Penrose's notion 
stemming from conformal embeddings. 

A somewhat different study of boundaries from 
embeddings has been formulated by García-Parrado 
and Senovilla (2003), classifying points at infinity and 
singularities in 0;(M) for embeddings ¢:M — N in 
which N is a spacetime, ó preserves the chronology 
relation <, and there is also a diffeomorphism 
i: ó(M) — N which again preserves < (the chronol- 
ogy relation in a spacetime is defined thus: x < y if 
and only if there is a future-directed timelike curve 
from x to y). This scheme applies more generally than 
to conformal embeddings, but the requirement for 
chronology-preserving maps in both directions guar- 
antees a strong sensitivity to causality; it amounts to a 
mild extension of Penrose’s notion that is often much 
easier to construct. 


Universal Constructions 
B-Boundary 


Attempts have been made to formulate boundary 
concepts specifically for defining singularities as 
ideal endpoints for finite-length geodesics. The 
most complete venture in this direction is the 
b-boundary (“b” for *bundle") of Schmidt (Hawking 
and Ellis 1973, pp. 276-284). This is a formulation 
that takes note only of the connection in the linear 
frames bundle L(M) of a spacetime M (or of any 
manifold with a linear connection, metric or other- 
wise); in other words, it takes no particular note of 
the spacetime metric or even of the causal structure of 
the spacetime, but only of the notion of parallel 
translation of tangent vectors along curves. Parallel 
translation of a frame (a basis for the tangent space) 
along a curve is used to obtain an ad hoc length for 
the curve by treating the translated frame as positive- 
definite orthonormal at each point; whether this 
length is finite or infinite is independent of the choice 
of the original frame. The Schmidt construction 


defines a boundary on M which gives an endpoint for 
each curve, endless in M, which is finite in that sense: 
Select a positive-definite metric on L(M), give it a 
boundary by means of Cauchy completion, and then 
take the appropriate quotient by the bundle group. 
This has an appealing universality of application, but 
the problems of putting it into practice are quite 
formidable. Also, the fact that it takes no special note 
of the spacetime character of M suggests that it may 
not be of particular utility for physical insights. 


Causal Boundary: Basics 


In 1972 Geroch, Kronheimer, and Penrose (GKP) 
formulated a notion of boundary - the causal 
boundary - that is specifically adapted to the causal 
character of a spacetime M; indeed, it is defined in 
such a way that one need know only the chronology 
relation «& on M without any further reference to 
the metric (another way of saying this is that the 
causal boundary is conformally invariant). Like 
Schmidt's b-boundary, the causal boundary is a 
universal construction, not depending on any extra- 
neous choices; however, although it has an obvious 
clarity in its causal structure, there are subtleties in 
the choice of an appropriate topology which are 
perhaps not yet fully resolved. As this boundary 
construction appears to embody the best hopes for a 
practical universal construction, it is detailed here in 
some depth. 

The causal boundary construction applies only to 
strongly causal spacetimes; essentially, this means 
that the local causal structure at each point is 
exactly reflective of the global causal structure. 

The basic construction of the causal boundary of 
a spacetime M starts with two separate parts: the 
future and past (pre-)boundaries of M, intended as 
yielding endpoints for, respectively, future- and past- 
endless causal curves. Part of the difficulty of the 
causal boundary is knowing how best to meld these 
two into one; currently, there are several answers to 
this conundrum. 

The elements of the future causal boundary of M 
are defined in terms of the past-set operator I~. For 
a point x € M, the past of x is I (x)= {y| y < x}; for 
a set ACM,IT [A] - LAT (x). A set PCM is 
called a past set if I [P] — P; anything of the form 
P= [A] is a past set, and all past sets have this 
form. A past set P is an indecomposable past set (IP) 
if P cannot be written as P, U P2 for past sets which 
are proper subsets P; C P. IPs come in exactly two 
varieties: pointlike IPs (PIPs), of the form I (x) 
(Figure 4), and terminal IPs (TIPs), of the form I [c] 
for c a future-endless causal curve (Figure 5). (Of 
course, any [ (x) can also be expressed as I [c] for c 
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Figure 4 PIP P—/ (x). 


a causal curve ending at x.) The future causal 
boundary of M, Ó(M), consists of all the TIPs of M; 
the future causal completion of M is M —O(M)U M. 
But that is just a set; the causal structure of M needs 
to be extended to M. . 

For any x € M and P € 0(M), set x < P if and 
only if x € P; set P < x if and only if P C I (y) for 
some y < x (y € M); and for P and O in O(M), set 
P « Q if and only if PCI (y) for some y € O. 
If we consider this an extension of the < relation on 
M, then we end up with a relation which, like that 
on M, is transitive and antireflexive. Furthermore, it 
has the property that for all a, 8 € M,a « B if and 
only if for some x € M,a «& x « B. (One can also 
amend the chronology relation within M to be more 
like the definition in the extension; that is not of 
major import.) 

We can also extend the causality relation < on M 
to one on M (in M,x ~ y if there is a future-directed 


C 
Figure5 TIP P—/ c. 
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causal curve from x to y): for x € M and P,Q € 
O(M),x ~ P for I (x) C P,P <x for P c I (x), and 
Px © for Pc OQ. 

The intent is to have the elements of (M) provide 
future endpoints for future-endless causal curves in 
M; in particular, we want two such curves, cı and 
c2, to be assigned the same future endpoint precisely 
when I [ci] 2 1 [c2]. This is accomplished by the 
simple expedient of defining the future endpoint of a 
future-endless causal curve c to be P—TI [c]. We do 
not have a topology on M as yet, but it is worth 
noting that if P is the assigned future endpoint of c, 
then 1- (P) - I [c]; this is at least the correct causal 
behavior for a putative future endpoint of c. 

We can perform all the operations above in the 
time-dual manner, obtaining the past causal bound- 
ary O(M), consisting of terminal indecomposable 
future sets (TIFs), and the past causal completion 
M=0(M)UM. The full causal boundary of M 
consists of the union of @(M) with 0(M) with some 
sort of identifications to be made. 

As an example of the need for identifications, 
consider M to be L2 with a closed timelike line 
segment deleted, say M —L? — ((0, t) [0 € t x 1]. 
For O(M), we have first the boundary elements at 
infinity: the TIP ;* — M (the past of the positive time 
axis) and the set of TIPs making up 3” (the pasts of 
null lines going out to infinity in LŽ); and then, the 
boundary elements coming from the deleted points: 
for each t with 0 <¢ € 1, two IPs emanating from 
(0, t), that is, P7, the past of the null line going 
pastwards from (0, t) toward x > 0, and P; , the past 
of the null line going pastwards from (0, t) toward 
x < 0; and Po, emanating from (0, 0), that is, the 
past of the negative time axis. Similarly, Ó(M) 
consists of 7, , TIFs F} and F; emanating from 
(0, t) for 0 € t < 1, and the TIF FI emanating from 
(0, 1). We probably want to make at least the 
following identifications for each ? with 0 — t< 1, 
Pr =F} and P} = F;;P{ =F = P]; and Fo = 
Po = Fg. This results in a two-sided replacement 
for the deleted segment; for some purposes, it might 
be deemed desirable to identify the two sides as one, 
but a universal boundary is probably a good idea, 
leaving further identifications as optional quotients 
of the universal object. 

How best to define the appropriate identifications 
in general is a matter of some controversy. GKP 
defined a somewhat complicated topology on 
M=0(M)U0(M)UM, then used an identification 
intended to result in a Hausdorff space. There are 
significant problems with this approach in some 
outré spacetimes, as pointed out by Budic and Sachs 
(1974) and Szabados (1989), both of whom recom- 
mended a different set of identifications. But what is 


of more concern is that the topology prescribed by 
GKP is not what might be expected in even the 
simplest of cases, for example, Minkowski space: L” 
needs no identifications among boundary points (no 
matter whose identification procedure is followed). 
The GKP topology on L”, restricted to O(L"), is not 
that of a cone (8? x R! with a point added), as is 
the case for S^ in the conformal embedding into E"; 
but, instead, each null line in O(L ) (not including ;*) 
is an open set, and 7* has no neighborhood in O(1") 
save for the entire boundary. This is a topology 
bearing no relation at all to that of any embedding. 


Future Causal Boundary 


Construction An alternative approach, initiated by 
Harris (1998), is to forego the full causal boundary 
and concentrate only on M and M separately. There 
is an advantage to this in that the process of future 
causal completion — that is to say, forming M from 
M - can be made functorial in an appropriate 
category of "chronological sets": a set X with a 
relation «& which is transitive and antireflexive such 
that it possesses a countable subset S which is 
"chronologically dense," that is, for any x,y € X, 
there is some s € $ with x « s «& y. Any strongly 
causal spacetime M is a chronological set, as is M. 
The entire construction of the future causal bound- 
ary works just as well for a chronological set. The 
role of a timelike curve in a chronological set is 
taken by a future chain: a sequence c={x,} with 
Xn « X541 for all n. For any future chain c, I [c] is an 
IP, and any IP can be so expressed; but unlike in 
spacetimes, I (x) may or may not be an IP for x € X. 
Then, X is always future complete in the sense that 
for any future chain c in X, there is an element a € X 
with I~ (a) —I [c]: for instance, if the chain c lies in 
X but there is no x € X with I (x) —I [c], just let 
œ=] |c], which is an element of 0(X). This yields a 
functor of future completion from the category of 
chronological sets to the category of future-complete 
chronological sets, and the embedding X — X is a 
universal object in the sense of the category theory; 
this implies that it is categorically unique and is the 
minimal future-completion process. 

However, it is crucial to have more than the 
chronology relation operating in what is to be a 
boundary; topology of some sort is needed. This is 
accomplished by defining what might be called the 
future-chronological topology for any chronological 
set — including for M when M is a strongly causal 
spacetime. This topology is defined by means of a 
limit-operator L on sequences: if X is the chron- 
ological set, then for any sequence of points c = {xn} 
in X, L(c) denotes a subset of X which is the set of 


limits of c. It is explicitly recognized that there may 
be more than one limit of a sequence, as the space 
may not be Hausdorff; no attempt is made to 
remove any non-Hausdorffness, as this is viewed as 
giving important information on how, possibly, 
two points in the future causal boundary represent 
very similar and yet not identical pieces of 
information about the causal structure at infinity. 
Once the limit operator is in place, the actual 
topology on X is defined thus: a subset A C X is 
said to be closed if and only if for any sequence 
v C A, L(c) C A (and open sets are complements of 
closed sets). This yields the elements of L(o) as 
topological limits of ø. 

The definition of L is simplest when X has the 
property that I (x) is an IP for any x € X; as this is 
true for X being either a spacetime M or the future 
causal completion M of a spacetime, the discussion 
here is restricted to this situation. Let us also make 
the common assumption that X is past-distinguishing, 
that is, I~ (x) 2 I (y) implies x = y. 

Let c—[x,] be a sequence of points in a past- 
distinguishing chronological set X in which the past 
of any point is an IP. Then L(a) consists of those 
points x for which (see Figures 6 and 7) 


1. for all y € I (x), for n sufficiently large, y «& xp, 
and 

2. for any IP P DT (x), there is some z € P such that 
for n sufficiently large, z K Xn. 


Then the future-chronological topology on X has 
these features: 


1. It is a T; topology, that is, points are closed. 

2. If I (x) 3I [c] for a future chain c= {xa}, then x 
is a topological limit of the sequence {xn}. 

3. If X= M, a strongly causal spacetime, then the 
future-chronological topology is precisely the 
manifold topology. 

4. If X=M, the future causal completion of a 
strongly canal spacetime M, then the induced 
topology on M is the manifold topology, Ó(M) is 
a closed subset of M, and M is dense in M. As per 
property (2), for any future-endless causal curve c 
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Figure7 x ¢ L({Xn}): there is some IP P DF (x) such that for 
all ze P, Z < Xn for infinitely many n. 


in M, the point I~[c] in Ó(M 
endpoint of c in M. 

5. If X = L”, then X is homeomorphic to the conformal 
image of L" in E" together with YY and i*; in 
particular, Ó(L,) has the topology of a cone. 


) is the topological 


Examples The future causal boundary with the 
future-chronological topology can be calculated 
with a fair degree of success. For instance, if M 
is conformal to a simple product spacetime O x L/ 
(Q a Riemannian manifold), then 0(M) is much 
like O(L") in that it consists of null or timelike 
lines factored over a particular boundary construc- 
tion O(Q) on Q, coming together at a single point i* 
(the IP which is all of M); if O is complete, then 
these are all null lines, and together they may be 
called 3. 

The elements of 0(Q) are defined in terms of the 
Lipschitz-1 functions on O known as Busemann 
functions: if c:[a,w)— QO is any endless unit-speed 
curve (typically, w= oo), then the Busemann function 
b, : O — R is defined by b-(q) = lim, ., (s — d(c(s), q)), 
where d is the distance function in QO; this function 
is either finite for all g or infinite for all g. The set 
B(Q) of finite Busemann functions has an R-action 
defined by a: ADR where (a: c)(s)— c(s +a). 
Then 83(Q)—B(Q)/R. For any PcÓ(M), the 
boundary of P, as a subset of Ox L! = Ọ x R, is 
the graph of a Busemann function (the function is 
b, for P generated by a null curve projecting to c); 
and a point x —(q,t) in M can be represented by 
O(I (x), which is the graph of the function 
t— d(-,q). Thus, one could use the function- 
space topology on B(Q) to topologize M; in that 
function-space topology (M) is a cone on O(Q), 
and M, apart from i", is the topological product of 
R with OUO(Q). The future-chronological topol- 
ogy is sometimes different from the function-space 
topology, allowing more convergent sequences 
than the function-space topology does. When this 
happens, the result is non-Hausdorff, revealing 
pairs of points in (M) which are more closely 
related to one another than the function-space 
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topology reveals; but it is still the case that (M), 
apart from 7*, is fibered by R over 0(O). 

If O is a warped product O — (a,b) x K for a 
compact manifold K with metric dr? + e?" with h 
a metric on K, then one can calculate more precisely: 
if, for instance, ó has a minimum in the interior of 
(a, b) and has suitable growth on either end, then 
O(Q) represents two copies of K (one for each end of 
(a, b) x K), the future-chronological topology is the 
same as the function-space topology, and M (apart 
from i*) is a simple product of R with QU (Q): 
O(M) is precisely a null cone over two copies of K. 
This applies, for instance, to exterior Schwarzschild, 
where K — S?; the boundary at one end of exterior 
Schwarzschild is the usual 3*, and the boundary at 
the other end is the null cone {r=2m}, where 
exterior attaches to interior Schwarzschild. 

Calculations for the future-chronological topology 
become much easier when O(M) is purely spacelike, 
that is, no P € O(M) is contained in the past of any 
other element of M. For instance, if M is conformal 
to a multiwarped product, Q1 x -+< x Om x (a, b) 
with metric fi(tY bi Me f (t) hm 一 dt’, where bb; 
is a Riemannian metric on Q;, then 9(M) will be 
purely spacelike if all the Riemannian factors are 
complete and for each i, T 1/fi(t) dt < oo; in that 
case, O(M)& O, where O=Q; x---x Om and 
M = Q x (a,b). This applies, for instance, to inter- 
ior Schwarzschild, where Q; —R'! and Q= 9^. 
yielding the topology of R! x S? for the Schwarzs- 
child singularity. 

There is a categorical universality for spacelike 
boundaries and the future-chronological topology. 
This means that any other reasonable way of 
future-completing interior Schwarzschild must yield 
R! x S? or a topological quotient of that for the 
singularity; and if the result is to be past-distinguishing, 
R! x S? is the only possibility. 

Of course, all this can be done in the time-dual 
fashion, using the past-chronological topology on 
M. It would be desirable to combine the future and 
past causal boundaries with a suitable topology as 
well as appropriate identifications. There has been 
some work in that direction. 


Causal Boundary: Revisited 


Marolf and Ross (2003) have proposed an identification 
of TIPs and TIFs that relies on the equivalence relation 
defined by Szabados. For an IP P and IF F, call (P, F) a 
Szabados pair if P C I^ (x) for all x € F, P is maximal 
among IPs for that property, and dually for F with 
respect to P. For instance, for any x € M, (I (x), I^ (x)) 
is a Szabados pair. The Marolf—Ross version of the 
causal boundary, 0(M), consists of all Szabados pairs 


formed of TIPs and TIFs, plus any TIP or TIF that 
cannot be paired; this produces an appropriate set of 
identifications within 0(M)U0(M). The chronology 
relation on M is extended to M = 0(M) U M by treating 
each point x in M as the Szabados pair (I~ (x), 1 * (x)) and 
each unpaired IP P as (P, 0) and unpaired IF F as (0), F), 
and then defining  (P,F) « (P', F) whenever 
FnP' z . 

The resulting chronological set is not necessarily 
either past- or future-distinguishing, but it is (past and 
future)-distinguishing. The topology they propose 
places endpoints in (M) for all causal curves which 
are endless in M, but there may be multiple future 
endpoints for a single future-endless curve. The 
topology need not be Tı: points can fail to be closed. 
For a product spacetime M = Q x L', the Marolf-Ross 
topology on M is always the function-space topology. 

As of this writing, there is active research by J L Flores 
to institute a Marolf-Ross type of identification of O(M) 
with 0(M) using a topology that partakes more of the 
future- and past-chronological topologies. 


See also: Asymptotic Structure and Conformal Infinity; 
Spacetime Topology, Causal Structure and Singularities. 
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Boundary conformal field theory (BCFT) is simply 
the study of conformal field theory (CFT) in 
domains with a boundary. It gains its significance 
[1] because, in some ways, it is mathematically 
simpler: the algebraic and geometric structures of 
CFT appear in a more straightforward manner; and 
[2] because it has important applications: in string 
theory in the physics of open strings and D-branes, 
and in condensed matter physics in boundary critical 
behavior and quantum impurity models. 

This article, however, describes the basic ideas 
from the point of view of quantum field theory, 
without regard to particular applications or to any 
deeper mathematical formulations. 


Review of CFT 
Stress Tensor and Ward Identities 


Two-dimensional CFTs are massless, local, relati- 
vistic renormalized quantum field theories. 
Usually they are considered in imaginary time, 
that is, on two-dimensional manifolds with 
Euclidean signature. In this article, the metric is 
also taken to be Euclidean, although the formula- 
tion of CFTs on general Riemann surfaces is also 
of great interest, especially for string theory. For 
the time being, the domain is the entire complex 
plane. 

Heuristically, the correlation functions of such a 
field theory may be thought of as being given by 
the Euclidean path integral, that is, as expectation 
values of products of local densities with respect 
to a Gibbs measure Z! el“) [du], where the 
{w(x)} are some set of fundamental local fields, SE 
is the Euclidean action, and the normalization 
factor Z is the partition function. Of course, such 
an object is not in general well defined, and this 
picture should be seen only as a guide to 
formulating the basic principles of CFT which 
can then be developed into a mathematically 
consistent theory. 


In two dimensions, it is useful to use the so-called 
complex coordinates z—x!--ix?^,z—x!-— ix^. In 
CFT; there are local densities 9;(z, z), called primary 
fields, whose correlation functions transform covar- 
iantly under conformal mappings z — z = f (z): 


($1 (21,21)02 (22, 22) -- T. 
= - II^ zi) J^f'(z;) (z) ‘(dy (24,2) 2 (25,2) -- -) (1] 


where (bj, b;) (usually real numbers, not complex 
conjugates of each other) are called the conformal 
weights of ¢;. These local fields can in general be 
normalized so that their two-point functions have 
the form 


(oj (zj, zj)Ók(Zk; Zh)) = Oik/ (Zi — Zk) y^(z, — z, y^ [2] 


They satisfy an algebra known as the operator 
product expansion (OPE) 


bj(Z1, 21) - ój(22, 22) 


: -bi-hjl 
"s E tata = sy 7 "P 
k 


x (Zi = 22) "^6 z)---. [3] 


which is supposed to be valid when inserted into 
higher-order correlation functions in the limit when 
|Z; — Z2| is much less than the separations of all the 
other points. The ellipses denote the contributions of 
other nonprimary scaling fields to be described 
below. The structure constants cj,, along with the 
conformal weights, characterize the particular CFT. 

An essential role is played by the energy- 
momentum tensor, or, in Euclidean field theory 
language, the stress tensor T^", Heuristically, it is 
defined as the response of the partition function to 
a local change in the metric: 


T! (x) = — (2x) ln Z/6g,(x) 4] 


(the factor of 27 is included so that similar factors 
disappear in later equations). 

The symmetry of the theory under translations 
and rotations implies that T"" is conserved, 
O, T"" =0, and symmetric. Scale invariance implies 
that it is also traceless © = T7 —0. It should be 
noted that the vanishing of the trace of the stress 
tensor for a scale invariant classical field theory does 
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not usually survive when quantum corrections are 
taken into account: indeed, © x (g), the renorma- 
lization group (RG) beta-function. A quantum field 
theory is thus only a CFT when this vanishes, that is, 
at an RG fixed point. In complex coordinates, the 
components T;;— T; —40O vanish, while the con- 
servation equations read 


dT, = OL TS m 0 [5] 


Thus, correlators of T(z) = Tą are locally analytic 
(in fact, globally prin pin functions of z, while 
those of T(z)- Tz; are antianalytic. It is this 
property of analyticity which makes CFTs tractable 
in two dimensions. 

Since an infinitesimal conformal transformation 
z — z+ a(z) induces a change in the metric, its effect 
on a correlation function of primary fields, given by [1], 
may also be expressed through an appropriate integral 
involving an insertion of the stress tensor. This leads to 
the conformal Ward identity: 


[TOI 
=) ( (hja (zi) + o(2;)(0/0z;) Do (2;,2;) )) [6] 


where C is a contour encircling all the points {z;}. 
(A similar equation holds for the insertion of T.) 
Using Cauchy's theorem, this determines the first 
few terms in the OPE of T with any primary density: 
" h; _ 

T(z) - ój(zj, 2) ^ ———3 62. 2;) 

(z — z) 


1 
中 zz; od pi) +O(1) [7] 


The other, regular, terms in the OPE generate new 
scaling fields, which are not in general primary, 
called descendants. One way of defining a density to 
be primary is by the condition that the most singular 
term in its OPE with T is a double pole. 

The OPE of T with itself has the form 


__ 4/2 - e 
T(z): T(z) = TERT Tues? T(z1) +--+ [8] 


The first term is present because (T(z)T(zi)) is 
nonvanishing, and must take the form shown, with c 
being some number (which cannot be scaled to 
unity, since the normalization of T is fixed by its 
definition) which is a property of the CFT. It is 
known as the conformal anomaly number or the 
central charge. This term implies that T is not itself 
primary. In fact, under a finite conformal transfor- 
mation z — z' =f(z), 


+ fte)" T 


where {2,2} — (f""f" 
derivative. 


15 05. z} [9] 


aa st, is the Schwartzian 


Virasoro Algebra 


As with any quantum field theory, the local fields 
can be realized as linear operators acting on a 
Hilbert space. In ordinary QFT, it is customary to 
quantize on a constant-time hypersurface. The 
generator of infinitesimal time translations is the 
Hamiltonian H, which itself is independent of 
which time slice is chosen, because of time 
translational symmetry. It is also given by the 
integral over the hypersurface of the time-time 
component of the stress tensor. In CFT, because of 
scale invariance, one may instead quantize on fixed 
circle of a given radius. The analog of the 
Hamiltonian is the dilatation operator D, which 
generates scale transformations. Unlike H, the 
spectrum of D is usually discrete, even in an 
infinite system. It may also be expressed as an 
integral over the radial component of the stress 


tensor: 
27 
= 5 -| rT, rdó 


No jae - 7. | teas 


三 Lo t Lo [10] 


where, because of analyticity, C can be any contour 
encircling the origin. 
This suggests that one define other operators 


L3 


€ EJ +177, 
L, = zje T(z)dz [11] 


and similarly the EB From the OPE [8] then follows 
the Virasoro algebra V: 


(Ln, Ly] = (n — m)Ln4m + inar 一 
with an isomorphic algebra V generated by the Ly. 

In radial quantization, there is a vacuum state |0). 
Acting on this with the operator corresponding to a 
scaling field gives a state |ój) = @;(0, 0)|0) which is 
an eigenstate of D: in fact, 


Loló) = hilo), Lolei) = bjloj) [13] 


From the OPE [7], one sees that |Ly@j) x L,|ó;), 
and, if ; is primary, L,|ó;) — 0 for all n > 1. 

The states corresponding to a given primary field, 
and those generated by acting on these with all the 
L, with 2 < 0 an arbitrary number of times, form a 


1)651,.0 [12] 


highest-weight representation of V. However, this is 
not necessarily irreducible. There may be null 
vectors, which are linear combinations of states at 
a given level which are themselves annihilated by all 
the L, with n > 0. They exist whenever þh takes a 
value from the Kac table: 
(r(m+1)—sm)* — 1 

P= Big = dme 1) [14] 
with the central charge parametrized as c= 1 — 6/ 
(m (m + 1)), and r, s are non-negative integers. These 
null states should be projected out, giving an 
irreducible representation Vy. 

The full Hilbert space of the CFT is then 


H= DV, e V; [15] 
hb 


where the non-negative integers n, ; specify how 
many distinct primary fields of weights (5, P) there 
are in the CFT. 

The consistency of the OPE [3] with the existence 
of null vectors leads to the fusion algebra of the 
CFT. This applies separately to the holomorphic and 
antiholomorphic sectors, and determines how many 
copies of V, occur in the fusion of Y, and Vp: 


Y, © Vy = M NEV [16] 


where the N*, are non-negative integers. 

A particularly important subset of all CFTs 
consists of the minimal models. These have rational 
central charge c—1 — 6(p — q)*/pq, in which case 
the fusion algebra closes with a finite number of 
possible values 1<r<gq,1<s<p in the Kac 
formula [14]. For these models, the fusion algebra 
takes the form 


/ / 
ryt+r2—1 s44s55-—1 


» Ms [17] 


r=|r;—r2| s=|s;—s2| 


Vs ©) Vises = 


where the prime on the sums indicates that they are 
to be restricted to the allowed intervals of r and s. 

There is an important theorem which states that 
the only unitary CFTs with ¢<1 are the mini- 
mal models with p/q-(m + 1)/m, where m is an 
integer 3. 


Modular Invariance 


The fusion algebra limits which values of (h,h) 
might appear in a consistent CFT, but not which 
ones actually occur, that is, the values of the n, ;. 
This is answered by the requirement of modular 
invariance on the torus. First consider the theory on 
an infinitely long cylinder, of unit circumference. 
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This is related to the (punctured) plane by the 
conformal mapping z (1/27)Inz=t+ix. The 
result is a QFT on the circle 0O<x< 1, in 
imaginary time £. The generator of infinitesimal 
time translations is related to that for dilatations in 
the plane: 


^ ^ TC 
.Pe 
io: 


TC 


= 2n(Lo + Lo) [18] 
where the last term comes from the Schwartzian 
derivative in [9]. Similarly, the generator of transla- 
tions in x, the total momentum operator, is 
P =2n(Lo — Lo). 

A general torus is, up to a scale transformation, a 
parallelogram with vertices (0,1,7,1+ 7) in the 
complex plane, with the opposite edges identified. 
We can make this by taking a cylinder of unit 
circumference and length Im, 7, twisting the ends by 
a relative amount Re 7, and sewing them together. 
This means that the partition function of the CFT on 
the torus can be written as 


Z(r T) - tre (m7) H-i(Im r)P 


=" gi -c[24 了 Lo-c/24 [19] 
using the above expressions for H and P and 
introducing q = e^", 

Through the decomposition [15] of H, the trace 
sum can be written as 


Z(r.7) = 9 ^ ny, sxXp(q) xi (a) [20] 
b.b 


where 


xn(q) = try, q'^*7* = V d, (N)q*-VP^*N — m1] 
N 


is the character of the representation of highest weight 
b, which counts the degeneracy d; (N) at level N. It is 
purely an algebraic property of the Virasoro algebra, 
and its explicit form is known in many cases. 

All of this would be less interesting were it not 
for the observation that the parametrization of the 
torus through 7 is not unique. In fact, the 
transformations $:7— —-1/r and T:r-—741 
give the same torus (see Figure 1). Together, these 


一 1 六 


0 1 0 


Figure 1 -Two equivalent parametrizations of the same torus. 
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operations generate the modular group SL(2, Z), 
and the partition function Z(r,7) should be 
invariant under them. T-invariance is simply imple- 
mented by requiring that b — b is an integer, but 
the S-invariance of the right-hand side of [20] 
places highly nontrivial constraints on the n, ;. 
That this can be satisfied at all relies on the 
remarkable property of the characters that they 
transform linearly under S: 


(e Or") = > SP y le”) [22] 
b! 


This follows from applying the Poisson sum formula 
to the explicit expressions for the characters, which 
are related to Jacobi theta-functions. In many cases 
(e.g., the minimal models) this representation is 
finite dimensional, and the matrix S is symmetric 
and orthogonal. This means that one can immedi- 
ately obtain a modular invariant partition function 
by forming the diagonal sum 


Z — 》 xi(a)xs(à) [23] 
b 


so that z,;—6ó,;. However, because of various 
symmetries of the characters, other modular invariants 
are possible: for the minimal models (and some others) 
these have been classified. Because of an analogy of the 
results with the classification of semisimple Lie 
algebras, the diagonal invariants are called the A-series. 


Boundary CFT 


In any field theory in a domain with a boundary, 
one needs to consider how to impose a set of 
consistent boundary conditions. Since CFT is for- 
mulated independently of a particular set of funda- 
mental fields and a Lagrangian, this must be done in 
a more general manner. A natural requirement is 
that the off-diagonal component Tj, of the stress 
tensor parallel/perpendicular to the boundary should 
vanish. This is called the conformal boundary 
condition. If the boundary is parallel to the time 
axis, it implies that there is no momentum flow 
across the boundary. Moreover, it can be argued 
that, under the RG, any uniform boundary condi- 
tion will flow into a conformally invariant one. For 
a given bulk CFT, however, there may be many 
possible distinct such boundary conditions, and it is 
one task of BCFT to classify these. 

To begin with, take the domain to be the upper- 
half plane, so that the boundary is the real axis. The 
conformal boundary condition then implies that 
T(z) = T(z) when z is on the real axis. This has the 
immediate consequence that correlators of 了 are 
those of T, analytically continued into the lower- 


half plane. The conformal Ward identity, cf. [7], 
now reads 


b; 1 
— c A 
| X425) FG 


In radial quantization, in order that the Hilbert 
spaces defined on different hypersurfaces be equiva- 
lent, one must choose semicircles centered on some 
point on the boundary, conventionally the origin. 
The dilatation operator is now 


2o 
= 2ni Js 


Š 1 E NN 
zT(z)dz — = fi zI(z)dz [25] 
where S is a semicircle. Using the conformal 
boundary condition, this can also be written as 

^ - 1 
D = Lo == 


mihe zT(z)dz [26] 


where C is a complete circle around the origin. As 
before, one may similarly define the L,, and they 
satisfy a Virasoro algebra. 

Note that there is now only one Virasoro algebra. 
This is related to the fact that conformal mappings 
which preserve the real axis correspond to real 
analytic functions. The eigenstates of Lo correspond 
to boundary operators ¢;(0) acting on the vacuum 
state |0). It is well known that in a renormalizable 
QFT operators at the boundary require a different 
renormalization from those in the bulk, and this will 
in general lead to a different set of conformal 
weights. It is one of the tasks of BCFT to determine 
these, for a given allowed boundary condition. 

However, there is one feature unique to boundary 
CFT in two dimensions. Radial quantization also 
makes sense, leading to the same form [26] for the 
dilation operator, if the boundary conditions on the 
negative and positive real axes are different. As far as 
the structure of BCFT goes, correlation functions with 
this mixed boundary condition behave as though a 
local scaling field were inserted at the origin. This has 
led to the term *boundary condition changing (bcc) 
operator," but it must be stressed that these are not 
local operators in the conventional sense. 


The Annulus Partition Function 


Just as consideration of the partition function on the 
torus illuminates the bulk operator content n, ;, it 


P^ áp — ey 


1 


Figure 2 The annulus, with boundary conditions a and b on 
either boundary. 


turns out that consistency on the annulus helps 
classify both the allowed boundary conditions, and 
the boundary operator content. To this end, con- 
sider a CFT in an annulus formed of a rectangle of 
unit width and height ó, with the top and bottom 
edges identified (see Figure 2). The boundary 
conditions on the left and right edges, labeled by 
a,b,..., may be different. The partition function 
with boundary conditions a and b on either edge is 
denoted by Z,,(6). 

One way to compute this is by first considering 
the CFT on an infinitely long strip of unit width. 
This is conformally related to the upper-half plane 
(with an insertion of bcc operators at 0 and oo if 
az b) by the mapping z — (1/z)lnz. The gen- 
erator of infinitesimal translations along the strip is 


Hs, = TD — 1c/24 = rlo — nc/24 [27] 
Thus, for the annulus, 


Z, (5) = tre ? Pa = tr giore [28] 


with g =e. As before, this can be decomposed 
into characters: 


= ng xy) [29] 
b 


but note that now the expression is linear. The non- 
negative integers n, give the operator content with 
the boundary conditions (ab): the lowest value of þh 
with z^, > 0 gives the conformal weight of the bcc 
operator, and the others give conformal weights of 
the other allowed primary fields which may also sit 
at this point. 

On the other hand, the annulus partition function 
may be viewed, up to an overall rescaling, as the 
path integral for a CFT on a circle of unit 
circumference, being propagated for (imaginary) 
time 67. From this point of view, the partition 
function is no longer a trace, but rather the matrix 
element of e^ between boundary states: 


Zab(6) = (aje P/^|b) [30] 
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Note that H is the same Hamiltonian that appears in 
[18], and the boundary states lie in H, [15]. 

How are these boundary states to be character- 
ized? Using the transformation law [9] the 
conformal boundary condition applied to the 
circle implies that L,-—L ,. This means that 
any boundary state |B) lies in the subspace 
satisfying 


L,|B) = L_,|B) [31] 


Moreover, because of the decomposition [15] of 
H, |B) is also some linear superposition of states from 
V, & Vj. This condition can therefore be applied in 
each subspace. Taking » — 0 in [31] constrains b=). 
For simplicity, consider only the diagonal CFTs with 
n, = óp j. It can then be shown that the solution 
of [31] is unique and has the following form. 
The subspace at level N of V, has dimension 
d (N). Denote an orthonormal basis by |b,N ;j), 
with 1 <j € dj(N), and the same basis for V, by 
Ib, Nj). The solution to [31] in this subspace is 
then 


oo d,(N 


= Nin ENG) [32] 


These are called Ishibashi states. Matrix elements of 
the translation operator along the cylinder between 
them are simple: 


& WN fe Gro desis-e 2 — [33] 


Ib, N;j) & |b, N; j) 


oo d(N) 
= M y y e-(4raU+N-(e/29) — [34] 
N=0 j=1 


iad [35] 


Note that the characters which appear are 
related to those in [29] by the modular transfor- 
mation $. 

The physical boundary states satisfying [29], 


= bypxn(e 


sometimes called the Cardy states, are linear 
combinations of the Ishibashi states: 
la) = 3 ^((hbla)b)) [36] 


b 


Equating the two different expressions [29] and [30] 
for Zab, and using the modular transformation law 
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[22] and the linear independence of the characters 
gives the (equivalent) conditions: 


"y = 》 Sh (alb) ((b'|b) [37] 
b! 
(alb’))((h'|b) = V Sb nh, [38] 
b 


These are called the Cardy conditions. The require- 
ments that the right-hand side of [37] should give a 
non-negative integer, and that the right-hand side of 
[38] should factorize in a and b, give highly 
nontrivial constraints on the allowed boundary 
states and their operator content. 

For the diagonal CFTs considered here (and for 
the nondiagonal minimal models) a complete solu- 
tion is possible. It can be shown that the elements S$ 
of S are all non-negative, so one may choose 
((b|0) = (S$). This defines a boundary state 


à) =F (s) Ip)) E 


b 


and a corresponding boundary condition such that 
n^, = 6p0. Then, for each hb’ Z 0, one may define a 
boundary state 


(b^) = Sj, KS)" [40] 


From [37], this gives n = pp. For each allowed pb 
in the torus partition function, there is therefore a 
boundary state |b’) satisfying the Cardy conditions. 
However, there is a further requirement: 
b eh 
thy = "GE 41 
0 
should be a non-negative integer. Remarkably, this 
combination of elements of S$ occurs in the Verlinde 
formula, which follows from considering consis- 
tency of the CFT on the torus. This states that the 
right-hand side of [41] is equal to the fusion algebra 
coefficient NS Since these are non-negative 
integers, the consistency of the above ansatz for the 
boundary states is consistent. 

We conclude that, at least for the diagonal models, 
there is a bijection between the allowed primary fields 
in the bulk CFT and the allowed conformally invariant 
boundary conditions. For the minimal models, with a 
finite number of such primary fields, this correspon- 
dence has been followed through explicitly. 


Example The simplest example is the diagonal c — i 


unitary CFT corresponding to m=3. The allowed 
values of the conformal weights are h = 0, 5, 4L, and 


$- [42] 


Sl- Nie tol 
hm 
| 
e Ss 


from which one finds the allowed boundary states 


scd) re 
Deana) o 


x) =10) - E) 45] 


The nontrivial part of the fusion algebra of this 
CFT is 


1 


Pi © Vi = Vo + Yi [46] 
Vi OV, = "n [47] 
V, © Y; E Yo [48] 


from which can be read off the boundary operator 
content 

nb—1 n? =n” =1 [49] 

h 

The c = $ CFT is known to describe the continuum limit 
of the critical Ising model, in which spins s — +1 are 
localized on the sites of a regular lattice. The above 
boundary conditions may be interpreted as the con- 
tinuum limit of the lattice boundary conditions s —1, 
free and s — — 1, respectively. Note there is a symmetry 
of the fusion rules which means that one could 


equally well have inverted the ordering of this 
correspondence. 


Other Topics 
Boundary Entropy 


The partition function on annulus of length L and 
circumference can be thought of as the quantum 
statistical mechanics partition function for a one- 
dimensional QFT in an interval of length L, at 
temperature /7. It is interesting to consider this 
in the thermodynamic limit when 6 — L/P is large. In 
that case, only the ground state of H contributes in 
[30], giving 


Zab(L, B) ~ (a|0) (0|b)e"*/6? [50] 


from which the free energy F,,= —58^ In Z,, and 
the entropy S,, = —/?^(OF,,/08) can be obtained. 
The result is 


Sab = (1c/38)L + Sa + sy + o(1) [51] 


where the first term is the usual extensive contribu- 
tion. The other two pieces s; = In((a|0)) and sp = 
In ((b|0)) may be identified as the boundary entropy 
associated with the corresponding boundary states. 
A similar definition may be made in massive QFTs. 
It is an unproven but well-verified conjecture that 
the boundary entropy is a nonincreasing function 
along boundary RG flows, and is stationary only for 
conformal boundary states. 


Bulk-Boundary OPE 


The boundary Ward identity [24] has the implica- 
tion that, from the point of view of the dependence 
of its correlators on z; and z, a primary field 
ój(z;,z;) may be thought of as the product of two 
local fields which are holomorphic functions of z; 
and z;, respectively. These will satisfy OPEs as |z; — 
z;| — 0, with the appearance of primary fields on the 
right-hand side being governed by the fusion rules. 
These fields are localized on the real axis: they are 
the boundary operators. There is therefore a kind of 
bulk-boundary OPE: 


$j(zj, 2) = 》 di (Im z) "ames [52] 
k 


where the sum on the right-hand side is, in principle, 
over all the boundary fields consistent with the 
boundary condition, and the coefficients dj are 
analogous to the OPE coefficients in the bulk. As 
before, they are nonvanishing only if allowed by the 
fusion algebra: a boundary field of conformal weight 
h, is allowed only if Na > 0. 
. Mw) 

For example, in the c= 5 CFT, the bulk operator 
with h=h= 4 goes over into the boundary opera- 
tor with b —0, or that with b = 4, depending on the 
boundary condition. The bulk operator with 
h=h= +, however, can only go over into the 
identity boundary operator with 5 — 0 (or a descen- 
dent thereof.) 

The fusion rules also apply to the boundary 
operators themselves. The consistency of these with 
bulk-boundary and bulk-bulk fusion rules, as well 
as the modular properties of partition functions, was 
examined by Lewellen. 


Extended Algebras 


CFTs may contain other conserved currents apart 
from the stress tensor, which generate algebras 
(Kac-Moody, superconformal, W-algebras) which 
extend the Virasoro algebra. In BCFT, in addition to 
the conformal boundary condition, it is possible (but 
not necessary) to impose further boundary condi- 
tions relating the holomorphic and antiholomorphic 
parts of the other currents on the boundary. It is 
believed that all rational CFTs can be obtained from 
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Kac-Moody algebras via the coset construction. The 
classification of boundary conditions from this point 
of view is fruitful and also important for applica- 
tions, but is beyond the scope of this article. 


Stochastic Loewner Evolution 


In recent years, there has emerged a deep connection 
between BCFT and conformally invariant measures 
on curves in the plane which start at a boundary of a 
damain. These arise naturally in the continuum limit 
of certain statistical mechanics models. The measure 
is constructed dynamically as the curve is extended, 
using a sequence of random conformal mappings 
called stochastic Loewner evolution (SLE). In CFT, 
the point where the curve begins can be viewed as 
the insertion of a boundary operator. The require- 
ment that certain quantities should be conserved in 
mean under the stochastic process is then equivalent 
to this operator having a null state at level two. 
Many of the standard results of CFT correspond to 
an equivalent property of SLE. 
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Introduction 


Inverse problems are generally positioned as the 
problems of determination of a system (its structure, 
parameters, etc.) from its “input — output” 
correspondence. 

The boundary-value inverse problems deal with 
systems which describe processes (wave, heat, electro- 
magnetic ones, etc.) occurring in media occupying a 
spatial domain. The process is initiated by a boundary 
source (input) and is described by a solution of a certain 
partial differential equation in the domain. Certain 
additional information about the solution, which can be 
extracted from measurements on the boundary, plays 
the role of the output. The objective is to determine the 
parameters of the medium - in particular, the coeffi- 
cients in the equation — from this information. 

The boundary control (BC) method (Belishev 
1986) is an approach to the boundary-value inverse 
problems based on their links with the control 
theory and system theory. The present article is a 
version of the BC method which solves the problem 
of reconstruction of a Riemannian manifold from its 
boundary spectral or dynamical data. 


Forward Problems 
Manifold 


Let (Q, d) be a smooth compact Riemannian manifold 

with the boundary T, dim > 2; d is the distance 

determined by the metric tensor g. For A C € denote 
(A) := {x eOQ|d(x, A) <r}, r0 


the hypersurfaces L'^:—(x € Q|d(x,P) - T, T » 0 
are equidistant to I’. In terms of the dynamics of 
the system, the value 


T, := min{T > 0| T)" = Q} = max d(-, T) 


means the time needed for waves, moving from T 
with the unit speed, to fill Q. 


Werner W Random Planar Curves and Scbramm-Loewner Evolu- 
tions, Springer Lecture Notes (to appear), math.PR/0303354. 


i Boundary Control Method and Inverse Problems of Wave 


A point x € €) is said to belong to the set co C Q if 
x is connected with T via more than one shortest 
geodesic. The set c:— co is called the separation set 
(cut locus) of € with respect to I’. It is a closed set of 
zero volume. Let 7,(») be the length of the geodesic 
emanating from y €FL. orthogonally to T and 
connecting y with c. The function 7,(:) is continuous 
on I. 

For xe€QOVXc the pair (y,7), such that 
r=d(x,T)= d(x,y), constitutes the semigeodesic 
coordinates of x. The set of these coordinates 


0 := ((v,7T)|yeT,0€-7«7.(4)) CT x [0, T, 


is called the pattern of Q. Pictorially, to get the 
pattern, one needs to slit Q along c and then pull it 
on the cylinder T x [0, T,]. The part 07:— o n (T x 
[0, T]) of the pattern consists of the semigeodesic 
coordinates of the points x € (D)! Vc (Figure 1). 


Dynamical System 


Propagation of waves in the manifold is described by 
a dynamical system a! of the form 


ug — Agu = h in Qx(0,T) [1] 


u 图: 二 Ht lt-0— 0 imQ [2] 


Hf on T x [0, T] [3] 


where A, is the Beltrami-Laplace operator, 0 < T € oo, 
f and b are the boundary and volume sources 
(controls), 4— u^" (x,t) is the solution (wave). 

Set H:=L2(Q); the spaces of the controls are 


£T = La(T' x [0, T]), 


g" := L,((0, T]; 0) 


Figure 1 Manifold and pattern. (Data from Belishev (1997).) 
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The “input — state" map of the system a! is 


realized by the control operator W7: 
Txg! SH, W'(f,b) :— wf ^(-,T) 
and its parts 
Wisi F" >H, 
Wy f := =u 


Wa: Gt = Jt 
T), WHh := a” (-,T) 


vo 


In the case f=0 the evolution of the system is 
governed by the operator L:— —A, defined on the 
Sobolev class H*(Q) N H} (Q) of functions vanishing 
on I’, and the semigroup representation 


u(r) = W h 


vol 
= f L si [e - 012 ]hc oa [4] 
0 


holds for all r > 0. 
The “input ++ output” map is implemented by the 
response operator RT: 7! — FT, 


RTf := ðu! onT x [0, T] 


defined on controls f € H'(T x [0, T]) vanishing on 
D x {t=0}; here v= v(») is the outward normal to T. 
The normal derivative O,u^? describes the forces 
appearing on T as a result of interaction of the wave 
with the boundary. 

The map CT : 7! = F" CT: {WEY WI, which 
is called the connecting operator, can be represented 
via the response operator of the system a?!: 


CT = JS R2T 2ToT [5] 


ST: £T — F”! being the extension of controls from 
Lx[0, T] onto P x [0,2T] as odd functions of t 
with respect to t= T, and JT: F?! — 7? being the 


integration 
PIANC) f fes 


Controllability 


Open subsets c CT and £s cC € determine the 
subspaces 


P d = (f € F' |supp f cox [0, T]} 
G! := {h e G" | supp h C à x [0, T]) 


of controls acting from c and w, respectively. In view 
of hyperbolicity of the problem [1]-[3], the relation 


supp «^^(..t) c (ao U (©, t>0 [6] 


holds for f € F! and h € G!. This means that the 
waves propagate in Q with the speed = 1. 


The sets of waves 


T. wT rT T., T 
U, gx Wat o. U, W019, 
are said to be reachable at time t=T from c and w, 
respectively. Denoting 


= (y € H|supp y C A} 


by virtue of [6] one has the embeddings 4! c H)" 
aud Ul c (o )!. The property of the system a 
that plays the key role in inverse problems is that 
these embeddings are dense: 


T 


cdu! =H), ~ AUT =H)" [7] 
for any T > 0 (cl denotes the closure in H). 

In control theory, relations [7] are interpreted as 
an approximate controllability of the system in 
subdomains filled with waves; the name “BC 
method" is derived from the first one (boundary 
controllability). This property means that the sets 
of waves are rich enough: any function supported 
in the subdomain (2)! reachable for waves excited 
on c can be approximated with any precision in 
H-norm by the wave u^?(., T) due to appropriate 
choice of the control f acting from c. The proof of 
[7] relies on the fundamental Holmgren-]John- Tataru 
unique continuation theorem for the wave equation 
(Tataru 1993). 


Laplacian on Waves 


If 5b —0, so that the system is governed only by 
boundary controls, its trajectory (u^?(-,2)|0 < t < T] 
does not leave the reachable set UF. In this case, the 
system possesses one more intrinsic operator LT 
which acts in the subspace clUT and is introduced 
through its graph 


eL = Af {WEF wife CPT x (0.7) i8 


(closure in HxH). By virtue of the relation 
E Wraf = =—A,W,f following from the wave 
equation [1] and [6], the operator L" is interpreted 
as Laplacian on waves filling the subdomain (T)’. 
In the case T > T,, one has (T) =Q, clum =H 
and L' is a densely defined operator in H, satisfying 
L" CL. Using [7], one proves the equality LT = L. 
This equality and representation [4] imply that 


rhb= fi L") 2sin|(r— t)(LT)'? |b t)dt [9] 


for all r>0 and any fixed T>T*. 
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Spectral Problem 


The Dirichlet homogeneous boundary-value pro- 
blem is to find nontrivial solutions of the system 


—Azp=Ap mM [10] 


p= onl [11] 


This problem is equivalent to the spectral analysis 
of the operator L; it has the discrete spectrum 
(Alg 1,0 € A1 € 22 <-+-, A, — 06; the eigenfunctions 
{pr}e i LPk=AMPk, form an orthonormal basis 
in H. 

Expanding the solutions of the problem (1)-(3) 
over the eigenfunctions of the problem [10], [11] 
one derives the spectral representation of waves: 


f(T) = WE = sDuwC) — 2] 
k=] 


where 


si y, t) — Piia sin (T = DA Ou,pk (y) 


Thus, for a given control f, the Fourier coefficients 
of the wave u^? are determined by the spectrum 
(Akl-ı and the derivatives (0,4]7.., 


Inverse problems 
General Setup 


The set of pairs X::—[44;0,94]7., associated with 
the problem [10], [11] is said to be the Dirichlet 
spectral data of the manifold (Q,d). The spectral 
(frequency domain) inverse problem is to recover the 
manifold from its spectral data. 

Since the speed of wave propagation is unity, the 
response operator R” contains the information not 
about the entire manifold but only about its part 
(D)'7. This fact is taken into account in the 
dynamical (time domain) inverse problem which 
aims to recover the manifold from the operator R*! 
given for a fixed T > T,. 

If the manifolds (Q’,d’) and (Q", d") are isometric 
via an isometry 1:9’ — Q”, then, identifying the 
boundaries by i(y) = ^, one gets two manifolds with 
the common boundary P= 0! = dQ” which | Possess 
identical inverse data: X/— X", R"! — R"^', Such 
manifolds are called equivalent: they are indistin- 
guishable for the external observer extracting X or 
R?T from the boundary measurements. Therefore, 
these data do not determine the manifold uniquely 
and both of the inverse problems need to be 
clarified. The precise formulation is given in the 
form of two questions: 


1. Does the coincidence of the inverse data imply 
the equivalence of the manifolds? 

2. Given the inverse data of an unknown manifold, 
how to construct a manifold possessing these 
data? 


The BC method gives an affirmative answer to the 
first question and provides a procedure producing a 
representative of the class of equivalent manifolds 
from its inverse data. The method is based on the 
concepts of model and *coordinatization." 


Model 


A pair consisting of an auxiliary Hilbert space H 
and an operator Woe F' — H is said to be a model 
of the system a!, if W,, is determined by inverse 
data, and the Hum U: Wate W af is an isometry 
from Ran Wf, C H onto Ran Wia C H. The model is 
an intermediate object in solving inverse problems. It 
plays the role of an auxiliary copy of the original 
dynamical system which an external observer can build 
from measurements on the boundary. While the 
genuine wave process inside Q, initiated by a boundary 
control, remains unaccessible for direct measurements, 
its H-representation can be visualized by means of the 
model control operator W,4. This is illustrated by the 
diagram on Figure 2, where the upper part is invisible 
for an external observer, whereas the lower part can be 
extracted from inverse data. 

Each type of data determines a corresponding 
model. The spectral model is the pair 


(o Sk) pra [13] 
(see [12]); the role of isometry U is played by the 
Fourier transform F : H — H, Fy:={(y,~)q}p_1- By 


virtue of [4], the data X also determine the operator 
W' vol* L»([0, ri H) 一 H, 


= bh, WT, := 1 


~ 


ta-f L^ sinfe- DE de 


Jo 
r>0 [14] 


Figure 2 Model of a system. (Data from Belishev (1997).) 
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where L:— ULU* = diag{A,};_,. Thus, the spectral 
model allows one to see the Fourier images of 
invisible waves. 

According to [5], the response operator R?7 
determines the modulus of the control operator 


Weal = (wy wr = (cr)? 


which enters in the polar decomposition 

W/,=|W,|. Along with it, the response operator 

determines the dynamical model 
H:=clRan(C’)'*, Wi: Cl)” [15] 


The correspondence “system — model” is realized 
by the isometry U = 9* : WUfto|WU|f. The opera- 
tor L := UL'U* dual to the Laplacian on waves, is 
determined by its graph 


grLT 
=A f {Waf Uff e er x (0,7) [16 


is also determined 
T>T,, the operator 
vol: La([0, r]; H) +H dual to W?’ p is represented 
in the form 


i, = [ G0) sinfe - 7] oa: 
0 
r20 [17] 


(see [8]) and, therefore, p 
by R?'. In the case 


- 


in accordance with [9]. Thus, the dynamical model 


visualizes the $*-images of the waves propagating 
inside Q. 


Wave Coordinatization 


In a general sense, a coordinatization is a corre- 
spondence between points x of the studied set A and 
elements X of another set A such that: (i) the 
elements of .A are accessible and distinguishable; (ii) 
the map x++* is a bijection; and (iii) relations 
between elements of .A determine those between 
points of A which are studied (H Weyl). Coordina- 
tization enables one to study A via operations with 
coordinates X € A. 

The external observer investigating the mani- 
fold probes € with waves initiated by sources on 
LE. The relevant coordinatization of Q described 
below uses such waves and is implemented in 
three steps. 

Step 1 (subdomains) Let x(y, 7) be the end point of 
the geodesic of the length 7 > 0 emanating from y ET 
in the direction —v(y), and let o? CT be a small 
neighborhood shrinking to y as £ — 0. If 7 € 7, (9), 
then the family of subdomains 


Figure 3 The subdomains. 


ur (y,7) = (D NT). IN (o5) 


(shaded domain on Figure 3) shrinks to x(», 7); if 
T > 7,(y), then the family terminates: wf (y, 7) — ( as 
€ < €9(y) (the case y=7 in Figure 3). Such behavior 
of subdomains implies that 


lim (((P)"\ D) JN (y Y 
- l (ry, TS rla) 
0, T > (y) [18] 


Step 2 (wave subspaces) Pass from the subdomains 


to the corresponding subspaces H(TY,H(Y, 


H(w*(y,7))’, and represent them via reachable sets 
by [7]: 
HIT) =A WF, (o =cl Why F 5. 
Hf (y,7))" = d WeaLa([0, r]; tf (y, 7)) 
= cl WaLa (I0. 7]; [HIY 
OHITY]n H(05)") 
=c W Lo Q r|; [cl WEF" 
od Wn d Wis, ) 
Define 
Wi 


(4.7 


© cel Wia F] Nel Wafa ) [19] 


) := lim cl Wt iL» (Io. r); [cl Waf” 


(4,0) := Wig, +0)» r 2 O (the limits in the sense of the 
strong operator convergence of the projections in H 
on the corresponding subspaces). By the definitions, 
one has Wi = lim~o H(w*(7,7))’, whereas [18] 
leads to the equality 


get d eee 


bn 119) rana) "4 
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for ally € T, 7 > 0, r > 0. Asa result, since any x € 2 
can be represented as x = x(», T), one attaches to every 
point of the manifold a family of expanding subspaces 
DW, .,|r = 0) built out of waves. As is seen from [20], 
the family is determined by the point x (not dependent 
on the representation x = x(»,7)); the subspaces which 
it consists of coincide with H(x)’. 
Expressing the distance as 


d(x',x") =2 inf (r » Ol n ("y x 10)) 
in accordance with [20], one can represent 
d(x’, x" 


=2 inf {r > 0|Wt, NW n  (0)) [21] 


where x'—x(»y, T), x" —x(»y', T"), and hence find 
the distance via the above families. 

Step 3 (wave copy) By varying 4 €ID,r > 0, 
gather all nonzero families (W; „|r > 0] —: X in the 
set () = [x). Redenoting W% :— W. , € €, endow the 
set with the distance 


d(x', x") :— 2 inf{r>0|Wy OW z (0)) 22] 


In view of [21], one has d(x’, x") = d(x’, x"), so 
that the metric space (9, d) is an isometric copy 
of (Q, d) by construction. Thus, the correspondence 


xt+X (*pointfamily") is an isometry and 
satisfies the general principles  (i)-(iii) of 
coordinatization. 


The manifold (Q, d) is the end product of the 
wave coordinatization. It represents the original 
manifold as a collection of infinitesimal sources 
interacting with each other via the waves which they 
produce. 


Solving Inverse Problems 


The motivation for the above coordinatization is 
that the wave copy can be reproduced via any 
model. Namely, the external observer with the 
knowledge of X or R?'(T > T,) can recover (Q, d) 
up to isometry by the following procedure: 


1. Construct the model corresponding to the given 
inverse data and determine the operators W,,, 
O<r<T by [13] [15]; then determine 
L,L , and W, by [14] or [16], [17]. 

2. Replace on the right-hand side of [19] all 
operators W without tildes by the ones with 
tildes, and get the subspaces Winn = UW" 
4€,T20,r20. 

VH Gather all nonzero families (Y... „lr = 0} =:X inthe 
set = {x} and redenote the subspaces as 
W; Wa € X; endow the set with the metric 
ala’ s e") us inf(r > 0| Wa N War Æ (0)) (see [22]), 

and ^ a ves (Â, d) of the wave copy (€), d). 


(757)? 


This sample is isometric to the original (Q, d) by 
construction. Identifying properly the boundaries an 
and T, one turns (Q, d) into a canonical representa- 
tive of the class of equivalent manifolds possessing 
the given inverse data. 

If the response operator R?" is given for a fixed 
T « T,, the above procedure produces the wave 
copy of the submanifold ((D)', d). This locality in 
time is an intrinsic feature and advantage of the BC 
method: longer time of observation on T increases 
the depth of penetration into 2. 


Amplitude Formula 


Another variant of the BC method is based on 
geometrical optics formulas describing the propaga- 
tion of singularities of the waves. 

Let y € H, and let 9 be the density of the volume 
in semigeodesic coordinates: dx— dI dr; the 
function 


ziy, r) = 18" (a7) y(x(7,7)), ($7) ee 
| 0, otherwise 


defined on T x [0, T,] is called the image of y. The 
amplitude formula represents the images of waves 
initiated by boundary controls in the form 


fo( T) (yr) = lim (Waa) (1 — P") Wea l(t) 


(F< T 


where I is the identity operator and P" is the 
projection in H onto clW/,F". The formula is 
derived by the ray method going back to 
J Hadamard, the derivation uses the controllability 
[7]. 

Any model determines the right-hand side of the 
ank relation _by_ the. isometry: PL AH (I — y 
the fite operator, and P' = UP" U* is the projec- 
tion in H onto clW,,77. This leads to the 
representation 
w^. TT) = lim (Wi) ( P") WE 

OÜO«T«T [23] 


and makes the amplitude formula a useful tool for 
solving the inverse problems. The external observer 
can construct a model via inverse data and then 
visualize by [23] the wave images on the part O" of 
the pattern (see Figure 1). The collection of images 
ul? corresponding to all possible controls f is rich 
enough for recovering the tensor g on O^ (i.e., the 
metric tensor in semigeodesic coordinates) and 
turning the pattern into an isometric copy of the 
submanifold I, d). This variant of the method is 
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more appropriate if one needs to recover unknown 
coefficients of the wave equation in Q — it can be 
realized in terms of numerical algorithms. 


Extensions of the Method 


Electromagnetic waves are also well suited for 
coordinatization and for constructing the wave copy 
(Q,d). An appropriate version of the amplitude 
formula also exists for the system governed by the 
Maxwell equations (see Further Reading). At present 
(2004), the applicability of the BC method to three- 
dimensional inverse problems of elasticity theory is 
still an open question. The following hypothesis 
concerns the Lamé system: the wave coordinatization 
procedure (steps 1—3) using the elastic waves instead 
of the above uw", gives rise to the copy of 2 c R? 
endowed with the metric |dx|^/ c; where 
Cp = y (À + 2u)/p is the speed of the pressure waves. 

The concept of model is used for solving inverse 
problems for the heat and Schródinger equations 
(Avdonin and Belishev, 1995-2004), as well as for 
the problem of boundary data continuation 
(Belishev 2001, Kurylev and Lassas 2002). A variant 
of the BC method allows one to recover not only the 
manifold but also the Schródinger type operators on 
it and/or the dissipative term in the scalar wave 
equation (Kurylev and Lassas 1993-2003). 

An appropriate version of the amplitude formula 
solves the inverse problem for one-dimensional two- 
velocity dynamical system which describes the waves 
consisting of two modes propagating with different 
speeds and interacting with each other (Belishev, 
Blagoveschenskii, Ivanov, 1997-2000). 

One more variant of coordinatization going back 
to the first paper on the BC method, associates with 
points x € Q the Dirac measures 6,; then, their 
images ó, are identified via suitable models. This 
variant solves inverse problems on graphs and the 
two-dimensional elliptic Calderon problem. The 
reader is referred to articles by the present author 
listed in Further Reading. 

Within the scope of the method, one derives some 
natural analogs of the classical Gelfand-Levitan- 
Krein-Marchenko equations (Belishev, 1987—2001). 
Also, an appropriate analog solves the kinematic 
inverse problem for a class of two-dimensional 
manifolds (Pestov 2004). 

There exists an abstract version of the 
approach, embedding the BC method into the 


framework of linear system theory (Belishev 
2001). The method is also related to the problem 
of triangular factorization of operators (Belishev 
and Pushnitski 1996). 

Numerical algorithms for solving two-dimensional 
spectral and dynamical inverse problems for the wave 
equation pu; — Au=0 which recover the variable 
density p have been developed and tested (Filippov, 
Gotlib, Ivanov, 1994-1999), 


See also: Dynamical Systems and Thermodynamics; 
Geophysical Dynamics; Inverse Problem in Classical 
Mechanics. 
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Introduction 


Integrable equations are a special class of nonlinear 
equations arising in the modeling of a wide variety 
of physical phenomena. It has been argued that 
integrable PDEs are in a certain, specific sense 
“universal” models for physical phenomena invol- 
ving weak nonlinearity. Indeed, integrable equations 
are obtained by a procedure involving rescaling and 
an asymptotic expansion from very large classes of 
nonlinear evolution equations, which preserves 
integrability while retaining in the limit weakly 
nonlinear effects. For this reason, integrable equa- 
tions are a very important class of PDEs. Important 
examples are the nonlinear Schrödinger (NLS) 
equation 


iqt + dex 一 2Xq = 0, A= +41 [1] 


the Korteweg-deVries (KdV) equation 
Gt + qx + dxxx + 6qqx = 0 [2] 
the modified KdV (mKdV) equation 


qt + qxz F 6Àg dx = 0, A— +1 [3] 


and the sine-Gordon (SG) equation in light-cone or 
laboratory coordinates 


dxt + sinq — 0 or qu —dxx + sing =0 [4] 

A general method for solving the initial-value 
problem for integrable equations in one space 
dimension was discovered in 1967, when in a 
pioneering and much celebrated work (Gardner 
et al. 1967), the initial-value problems for KdV 
with decaying initial condition was completely 
solved. Soon afterwards, it was understood that 
this method, now known as the “inverse scattering 
transform," is of more general applicability. Indeed, 
it can be applied to those nonlinear equations that 
can be written as the compatibility condition of a 
pair of linear eigenvalue equations. The method of 
solution for the Cauchy problem essentially relies on 
the possibility of expressing the equation through 
this pair, now called a Lax pair after the work of 
Lax (1968), who first clarified the connection. 
Zakharov and Shabat (1972) constructed such a 
pair for the NLS equation, and in subsequent years 
the Lax pairs associated with all important integr- 
able equations in one and two spatial variables were 
constructed. These include the NLS, sG, mKdV, 
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Davey-Stewartson I and II, and Kamdotsev— 
Petviashvili I and II equations. 

There is no universally accepted definition of an 
integrable PDE, but on account of the above results, 
the existence of a Lax pair can be taken as the 
defining property of such equations. In the course of 
the 1970s, the inverse scattering transform was 
applied to solve the initial-value (Cauchy) problem 
for many integrable equations. In principle, there is 
no obstruction to solving analytically the initial-value 
problem by the inverse scattering transform as soon 
as a Lax pair is constructed for the equation, and 
appropriate decaying initial conditions are pre- 
scribed. The solution is then characterized in 
terms of a certain integral equation. This approach 
is equivalent to associating with the initial-value 
problem a classical problem in complex analysis, 
namely a matrix Riemann-Hilbert problem, 
defined in the complex spectral space. This point 
of view is currently taken by many authors as it 
provides a unifying and very flexible framework for 
the analysis. 

After the success of the inverse scattering trans- 
form in solving the Cauchy problem, it was natural 
to attempt to generalize the approach to boundary- 
value problems. To describe the difficulties involved 
in this generalization, consider the case of evolution 
equations in one space and one time dimensions. 
The independent variables can be denoted by (x,t), 
with ¢ > 0 representing time. While the initial-value 
problem is posed on the full real line, hence for 
x € (一 coy 00), the simplest boundary-value problem 
is posed on a half-line, for x € (0,00). In addition 
to initial conditions for initial time t=0, it is 
necessary to prescribe conditions at the boundary 
x—0. The number of conditions that must be 
prescribed to obtain a problem which admits a 
unique solution depends on the particular equation, 
but for evolution equation it is roughly equal to 
half the number of x-derivatives involved in the 
equation. For example, for the NLS equation, a 
well-posed problem is defined as soon as one 
boundary condition at x — 0 is prescribed; hence a 
typical boundary-value problem for this equation is 
obtained, for example, when g(x,0)=qo(x) and 
q(0, t) = go(t) are prescribed and compatible, so that 
qo(0) = go(0). It follows that, while g,.(0,t) can be 
computed from the equation, g,(0,t) is not imme- 
diately known. An even more difficult situation 
arises for the KdV equation [2] (with the + sign), 
for which a well-posed problem is again defined as 
soon as one boundary condition is prescribed, so 
that there are two unknown boundary values. 
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Because of this simple fact, a straightforward 
application of the ideas of the inverse scattering 
transform immediately encounters one crucial diffi- 
culty. This transform method yields an integral 
representation of the solution which involves not 
only the given boundary conditions f(t), but also the 
other “unknown” boundary values — in our example 
for the NLS equation, the function g,(0,t). The 
problem of characterizing these unknown boundary 
values has impeded progress in this direction for over 
thirty years. 

On account of their physical significance, various 
boundary-value problems for the KdV equation have 
been considered, and classical PDE techniques (not 
specific to integrable models) have been used to 
establish existence and uniqueness results (Bona 
et al. 2001, Colin and Ghidaglia 2001, Colliander 
and Kenig 2001). These approaches, and in parti- 
cular the approach of Colliander and Kenig, are 
quite general and possibly of wide applicability, and 
give global existence results in wide functional 
classes. However, they do not rely on integrability 
properties. Indeed, none of these results use the 
integrable structure of the equation in any funda- 
mental or systematic way. However, the fact that 
these equations are integrable on the full line implies 
very special properties that should be exploited in 
the analysis and it is natural to try to generalize the 
inverse scattering transform approach. 

Such a generalization is sometimes directly possi- 
ble. For example, it has been used for studying the 
problem on the half-line for the hyperbolic version 
of the sG equation [4a] which does not involve 
unknown boundary values (Fokas 2000, Pelloni). It 
has also been used to study some specific boundary- 
value problems for the NLS equation, for example, 
for homogeneous Dirichlet or Neumann conditions, 
when it is possible to use even or odd extensions of 
the problem to the full line (Ablowitz and Segur 
1974), or more recently in Degasperis et al. (2001). 
In the latter case, however, the unknown boundary 
values are characterized through an integral Fred- 
holm equation, which does not admit a unique 
solution. Some special cases of boundary-value 
problems for the KdV equation (Adler et al. 1997, 
Habibullin 1999) and elliptic sG (Sklyanin 1987) 
have also been studied via the inverse scattering 
transform. However all the examples considered are 
nongeneric, and it has recently been shown (Fokas, 
in press) that the boundary conditions chosen fall in 
the special class of the so-called “linearizable” 
boundary conditions, for which the problem can be 
solved as if it were posed on the full line. One 
cannot hope to use similar methods to solve the 
problem with generic boundary conditions. 


Recently, Fokas (2000) introduced a general 
methodology to extend the ideas of the inverse 
scattering transform to boundary-value problems. 
This methodology provides the tools to analyze 
boundary-value problems for integrable equations to 
a considerable degree of generality. We note as a 
side remark that linear PDEs are trivially integrable, 
in the sense of admitting a Lax pair (in this case the 
Lax pair can be found algorithmically, while the 
construction of the Lax pair associated with a 
nonlinear equation is by no means trivial). As a 
consequence of this remark, the extension of the 
inverse scattering transform also provides a method 
for solving boundary-value problems for a large 
variety of linear PDEs of mathematical physics. 

What follows is a general description of the 
approach of Fokas, considering, for the sake of 
concreteness, the case of an integrable PDE in the 
two variables (x,t) which vary in the domain D 
(typically, for an evolution problem D = (0,0oo)x 
(0, T)). We assume that q(x,:) denotes the unique 
solution of a boundary-value problem posed for 
such an equation. 


The method consists of the following steps. 


1. Write the PDE as the compatibility condition of a 
Lax pair. This is a pair of linear ODEs for the 
function j-—j4(x,t,k) involving the solution 
q(x, t) of the PDE, the derivatives of this solution, 
and a complex parameter k, called the spectral 
parameter. This can be done algorithmically for 
linear PDEs, and in this case ux, t,k) is a scalar 
function. For nonlinear integrable PDEs, u(x, t, k) 
is in general a matrix-valued function. 

The equivalence of the PDE with a Lax pair 
can be reformulated in the language of differ- 
ential forms, and in this language it is easier to 
describe the methodology in general. Assume 
then that Q(x,t,k) is a differential 1-form 
expressed in terms of a function q(x,£) and its 
derivatives, and of a complex variable k, and one 
which is characterized by the property that 
dQ=0 if and only if q(x,t) satisfies the given 
PDE. The closure of the form €? yields the two 
important consequences 2(a) and 2(b) below. 

2. (a) Since the domain D under consideration is 
simply connected, the closed form €? is also exact; 
hence, it is possible to find the particular, 0-form 
u(x, t,k), solving du — Q. In particular, u(x, t, k) 
can be chosen to be sectionally bounded with 
respect to k by solving either a Riemann-Hilbert 
problem or a d-bar problem in the complex 
spectral k plane, and the solution j(x,t,k) is 
then expressed in terms of certain "spectral 
functions" depending on all the boundary values 


348 Boundary-Value Problems for Integrable Equations 


of the solution q(x,t) of the PDE. The function 
q(x,t) can then be expressed in terms of 
p(x,t,k). (b) The integral of Q along the 
boundary of the domain D vanishes. This yields 
an integral constraint between all boundary 
values of the solution of the PDE, which 
becomes an algebraic constraint for the spectral 
functions. The resulting algebraic identity is 
called the “global relation." 

3. The last step is the analysis the k-invariance 
properties of the global relation. This analysis 
yields the characterization of the spectral func- 
tions in terms only of the given boundary 
conditions. 


The crucial and most difficult step in the solution 
process is the characterization described above. The 
analysis required depends on the type of problem 
under consideration. For nonlinear integrable evolu- 
tion PDEs posed on the half-line x > 0, in general 
the characterization mentioned in step (3) involves 
solving a system of nonlinear Volterra integral 
equations. This is an important difference from the 
case of the Cauchy problem, where the solution is 
given by a single integral equation where all the 
terms are explicitly known. 

The method outlined above has been applied 
successfully to solve a variety of boundary-value 
problems for linear and integrable nonlinear PDEs. 
For concreteness, here the focus is on the important 
case of integrable evolution PDEs in one space, which 
illustrates clearly the generalities of this method. 


Integrable Evolution Equations in One 
Space Dimension 


The crucial property of integrable PDEs which is 
used in the inverse scattering transform approach to 
solve the initial-value problem is the fact that they 
can be written as the compatibility of a Lax pair. 
Many integrable evolution equations of physical 
significance (such as NLS, KdV, sG, and mKdV) 
admit a Lax pair of the form 


px + ifi(R)o3 = Q(x, t,k)u 


5 

pi + ifp(k)oap =QO(x,t,k)p i 
where ju(x,t,k) is a 2 x 2 matrix, o3 — diag(1, — 1), 
fi(k),i=1, 2, are analytic functions of the complex 
parameter k, and QO, O are analytic functions of k, 
of the function g(x, t) (and of its complex conjugate 
q(x,t) for complex-valued problems) and of its 
derivatives. For example, the NLS equation [1] is 
equivalent to the compatibility condition of the pair 


0 
Hx + ikozu = Qu, Q= [s 4 [6 


ly + 2ik?ozu = (2kQ — iQsos — iA|ql'os)p 


The first step towards a systematic new approach to 
solving boundary-value problem was the work of 
Fokas and Its, who associated the boundary-value 
problem for NLS on the half-line to a single 
Riemann-Hilbert problem determined by both 
equations in the Lax pair. The jump determining 
this Riemann-Hilbert problem has an explicit 
exponential dependence on both x and t. This differs 
from the classical inverse scattering approach, in 
which the x-part of the Lax pair is used to determine 
an x-transform with t-dependent scattering data, 
and the z-part of the Lax pair is then exploited to 
find the time evolution of these data. The work of 
Fokas and Its led to the understanding that both 
equations in the Lax pair [6] must be considered in 
order to construct a spectral transform appropriate 
to solve boundary-value problems. Fokas (2000) 
reviews his systematic way to solve these problems 
by performing the simultaneous spectral analysis of 
both equations in the Lax pair. The transform thus 
obtained, which is a nonlinearization of the Fourier 
transform, precisely generalizes the inverse scatter- 
ing transform. 

This simultaneous analysis also leads naturally to 
the identification of the "global relation" which 
holds between initial and boundary data, and which 
plays an essential role in deriving an expression for 
the solution of the problem which does not involve 
unknown boundary values. 

The Riemann-Hilbert problem with explicit (x, t) 
dependence, the global relation, and the invariance 
properties of the latter with respect to the spectral 
parameter are the fundamental ingredients of this 
systematic approach to solve boundary-value pro- 
blems for integrable equations. 

The steps involved in this method are summar- 
ized in the introduction. While steps (1) and (2) 
can be described generally, and, once the Lax pair 
is identified, can be performed algorithmically (at 
least under the assumption that the solution of the 
PDE exists), the last step is the most difficult part 
of the analysis, and it needs to be considered 
separately for each given problem. However, it is 
this step that yields the effective characterization 
of the solution. 

The results obtained for the particular case of eqn 
[1] are reviewed in detail in the next section, as they 
provide an important example, which can be 
generalized without any conceptual difficulty to 
eqns [2]-[4]. 
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The NLS Equation 


As already mentioned, the initial-value problem for 
NLS was solved, for decaying initial condition, by 
Zakharov and Shabat, and studied in depth by many 
others. However, by the mid-1990s only a handful 
of papers had been written on the solution of the 
boundary-value problem posed on the half-line, all 
on a specific example or aspect of the problem, or 
attempts at solving the problem using general PDE 
techniques. 

For this equation, the approach of Fokas yields 
the following results. Let the complex-valued 
function g(x,t) satisfy the NLS equation [1], for 
x > 0 and t > 0, for prescribed one initial and one 
boundary conditions. For the sake of concreteness, 
we select the specific initial and boundary 
conditions 


q(x,0) = qo(x) € S(R") 
q(0,t) = go(t) € S(R*) [7] 
qo(0) — go(0) 
where S denotes the space of Schwartz functions 
(similar results hold for different choices of bound- 
ary conditions, and less restrictive function classes). 
The solution of this initial boundary-value (IBV) 


problem can be constructed as follows (Fokas 2000, 
2002; in press): 


e Given go(x) construct the spectral functions 
{a(k), b(k)}. These functions are defined by 


a(k) = o»(0, k), b(k) E ó1(0, k) 


where the vector ó(x, k) with components ó (x, k) 
and @2(x,k) is the following solution of the 
x-problem of the associated Lax pair evaluated 
At £=0- 


bx + iko3h pe Q(x, 0, k)o, 


olx, k) = em +o(1)) as-X — oo 


7 0 . qo(x) 
Q9» (ey “0 ) 


0«x«oo, Imk > 0 


(c3 and O(x,t, k) are defined after eqns [5] and [6], 
respectively). 

Given qo(x) and go(t) characterize gi1(t) by the 
requirement that the spectral functions 
(A(t, k), B(t, k)} satisfy the global relation 


B(t,k) — R(k)A(t, k) = aito C(t, k) 


a(k) 
R(k) = m t € [0, T], ke D 


— 


where D denotes the first quadrant of the 
complex k-plane: 


D = {klRek > 0,Im k > 0) 


D denotes the closure of D, and c(t,k) is a 
function of k analytic in D and of order O(1/k) 
as k — oc. The spectral functions are defined by 


A(t, k) = erik tD (t, k), 


B(t,k) = —e2* (t, k) P 


where the vector ®(t,k) with components ©; and 
$; is the following solution of the t-problem of 
the associated Lax pair evaluated at x — 0: 


P, + 2ik?oz® = O(0,t,k)® 
O<t<T, REC 


AL = H [10] 


~ 


O(0,t,k) = 
| - lgo(£)]? 2kgo(t) + "md 
2kgo(t) — iAgi(t) Igo (t)? 


Given a(k),b(k) and A(k),B(k), define a 2x2 
matrix Riemann-Hilbert problem. This problem 
has the distinctive feature that its jump has 
explicit (x,t) dependence in the exponential 
form of exp {ikx +2ik?t}. Determine g(x,t) in 
terms of the solution of this Riemann—Hilbert 
problem by using the fact that these functions 
are related by the Lax pair. Then the function 
q(x,t) solves the IBV problem [1]-[7] with 
q(x, 0) = qo(x), q(0, t) = go(t), and q.(0, t) =g; (t). 


The above construction can be summarized in the 
following theorem (Fokas 2002): 


Theorem 1 Consider the boundary-value problem 
for the NLS equation |1] determined by the conditions 
[7]. Let a(k), b(k) be given by [8], and suppose that 
there exists a function g(t) such that if A(k), B(k) are 
defined by [9], then the global relation [8] holds. 

Let M(x,t,k) be the solution of the 2x2 
Riemann-Hilbert problem witb jump on tbe real 
and imaginary axes given by 


e M_(x,t,k)=M, (x,t, ))(x,t,k) with M—M. in 
the second and fourth quadrants of C, M = M, in the 
first and third quadrants of C, and J(x, t, k) is defined 
in terms of a, b, A, B and the exponential e*t. 

e M—I-- O(1/k) as k — oo and bas appropriate 
residue conditions if there are poles 
Then M(x,t,k) exists and is unique, and 


q(x,t) = 2i lim (RM(x, t,k))4 
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The result above relies on characterizing the 
unknown boundary value gi1(t) a priori by requiring 
that the global relation hold. Recently, substantial 
progress has been made in this direction in the case of 
integrable nonlinear evolution equations, in particu- 
lar of NLS. Namely Fokas (in press) contains an 
effective description of the map assigning to each 
given q(x, 0) = qo(x) and go(t) = q(0,1) a unique value 
for q«(0, t) (called the Dirichlet to Neumann map) for 
the NLS, as well as for a version of the Korteweg- 
deVries and sG equations. We state below the 
relevant theorem for the case of the NLS equation. 


Theorem 2 Let q(x,t) satisfy the NLS equation on 
the half-line 0 < x < oo,t > 0 with the initial and 
boundary conditions |7]. Then g(t) := qx(0,t) is 
given by 


gp) 20) / eE (tk) —a(t,-)) dh 


T 


42 f e PtkR(k) Bz k)dk 


7T Jap 


T 


with =(%1,2) given by the solution of |10]. The 
Neumann datum g(t) is unique and exists globally 
in t. 


This result yields a rigorous proof of the global 
existence of the solution of boundary-value pro- 
blems on the half-line for the NLS equation. There- 
fore, the assumption in Theorem 1 that a suitable 
function g1(t) exists can be dropped. 


Generalizations and Summary of Results 


Results analogous to the ones presented in the 
previous section can be phrased exclusively in terms 
of integral equations rather than in terms of 
Riemann-Hilbert problems, as done for example in 
Khruslov and Kotlyarov (2003). This is the point of 
view of the school of Gelfand and Marchenko, and in 
this setting the functions ® are given in the so-called 
Gelfand-Levitan-Marchenko representation. Results 
on boundary-value problems for the NLS equation 
using this representation have been obtained only 
under additional assumptions on the unknown part 
of the boundary values. It was only after the idea that 
the x- and t-parts of the spectral equations should be 
treated simultaneously that this approach yielded 
complete results. However, the Gelfand-Levitan- 
Marchenko representation yields a crucial simplifica- 
tion for deriving the explicit form of the Dirichlet to 
Neumann map and proving Theorem 2. This 


jx e 2E (ko, (t; k) — 4 (t, —k)] + igo(£))dk 
OD 


representation has now been derived for all equations 
[1]-[3], see Fokas (in press). 

The analysis of the invariance properties of the 
global relation with respect to k also yields the 
characterization of all the boundary conditions for 
which the transform obtained to represent the solution 
linearizes. For these boundary conditions, called 
linearizable, the solution can be represented as 
effectively as for the Cauchy problem. For example, 
the linearizable boundary conditions for the NLS 
equation are given by any boundary values that satisfy 


go(t)gi(t) — go(t)gi(t) = 0 


An example of boundary condition satisfying 
this constraint, encompassing also Dirichlet and 
Neumann homogeneous conditions, is q(0,t) 一 
xqx(0, t) — 0, with x a non-negative constant. 

As mentioned at the beginning of the previous 
section, the approach described in general can be 
used to obtain results similar to those given for the 
NLS equation for many other integrable evolution 
equations, in particular, mKdV (Boutet de Monvel 
et al. 2004), sG, and KdV (Fokas 2002). The results 
obtained are essentially the same as for NLS, 
starting from the general form [5] of the Lax pair, 
and include the derivation of the solution representa- 
tion, the complete characterization of linearizable 
boundary conditions, and the analysis of the Dirichlet 
to Neumann map. 

The approach above can also be used for studying 
boundary-value problems posed on finite domains, 
for x € [0, 1]. This has been done for a model for 
transient simulated Raman scattering (Fokas and 
Menyuk 1999), for the sG equation in light-cone 
coordinates (Pelloni, in press), and for the NLS 
equation (Fokas and Its 2004). In this case also the 
method yields a representation of the solution which 
is suitable for asymptotic analysis. In this respect, 
the question of soliton generation from boundary 
data is of some importance, and has been recently 
considered by various authors (Fokas and Menyuk 
1999, Boutet de Monvel and Kotlyarov 2003, 
Pelloni in press, Boutet de Monvel et al. 2004). 
The results are however still considered case by case, 
and there is no general framework for this problem 
identified yet. For problem on the half-line, solitons 
may be generated but not necessarily in correspon- 
dence to the singularities that generate soliton for 
the full line problem, even when the same singula- 
rities are present. For problems posed on finite 
domains, in some specific cases at least for the 
simulated Raman scattering, and the sG equations, 
it appears that the dominant asymptotic behavior is 
given by a similarity solution. 
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In conclusion, the extension of the inverse scattering 
transform given by Fokas provides the tool for analyzing 
boundary-value problems specific to nonlinear integr- 
able equations. This tool relies, in an essential way, on 
the integrability structure of the problem, and yields a 
full characterization of the solution as well as uniqueness 
and existence results. The solution representation thus 
obtained is not always fully explicit, but it is always 
suitable for asymptotic analysis using standard techni- 
ques such as the recent nonlinearization of the classical 
steepest descent method. 


See also: ð Approach to Integrable Systems; Integrable 
Discrete Systems; Integrable Systems and the Inverse 
Scattering Method; Integrable Systems: Overview; 
Nonlinear Schrödinger Equations; Riemann-Hilbert 
Methods in Integrable Systems; Separation of Variables 
for Differential Equations; Sine-Gordon Equation. 
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introduction 


Tensor or monoidal categories are encountered in 
various branches of modern mathematical physics. 
First examples came without mentioning the name of a 
monoidal category as categories of modules over a 
group or a Lie algebra. The operation of a monoidal 
product in this case is the usual tensor product X @c Y 
of modules (representations) X and Y. These categories 
are symmetric: the modules X & Y and Y & X are 


isomorphic; moreover, the permutation isomorphism 
(the twist) c:X@YrRY@X, x@y—y@x, is 
involutive, c? —idx;y. Next examples of monoidal 
categories were given by categories of representa- 
tions of supergroups or Lie superalgebras. They are 
also symmetric: now the symmetry (Koszul's rule) 
c:XQY—Y QX, xoy (-1)*9*4*8», @ x, is the 
twist with a sign, which depends on the degree (or 
parity) deg x of elements x € X. 

The development of the theory of exactly solvable 
models in statistical mechanics led Drinfeld (1987) 
to the notion of quantum groups — Hopf algebras H 
with additional structures (quasitriangular Hopf 
algebras). H-Modules also form a monoidal cate- 
gory; however, it is not symmetric, but only braided. 
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It means that a canonical braiding isomorphism 
c: X à Y — Y & X still exists, but it is not involutive 
any more, c^ Æ id. The braiding c satisfies the Yang- 
Baxter equation 


(c & 1)(1& c)(c& 1) 
—-(19c)(ce1)(19c):XoeYoZoZeYex 


for any three H-modules X, Y, Z. 

In the above examples, we also have an obvious 
isomorphism of associativity a:X & (Y 8 Z)—> 
(X& Y)&Z of the iterated tensor product. 
There are, however, monoidal categories of 
modules, where such an isomorphism is nontri- 
vial, namely, modules over quasi-Hopf algebras. 
These were introduced by Drinfeld (19892, b) in 
connection with the Knizhnik-Zamolodchikov equa- 
tions. These nontrivial associativity isomorphisms 
a:X@(Y@Z)—(X@Y)@Z are required to 
satisfy the pentagon equation of Mac Lane and 
Stasheff. 

Braided monoidal categories also arise in rational 
conformal field theories (RCFTs), integrable models 
of statistical mechanics and topological quantum 
field theories (TQFTs). The common feature of 
these categories is that they are semisimple abelian 
with finite number of simple modules. In other 
words, such a category C is equivalent to the category 
of finite-dimensional C" =C x --- x C-modules for 
some n. However, not monoidally equivalent, the 
monoidal structure can be rather involved. For 
instance, from the Ising model one can obtain the 
monoidal category with two simple objects I and X, 
which obey the monoidal law 16 1—1,16 X=X & 
]—-X,X&X-—1 o X. Clearly, such relations cannot 
be satisfied by finite-dimensional C-vector spaces 1 
and X, if ® would mean the usual tensor product &c 
of C-vector spaces. However, here & means simply a 
functor $:C x C—C with certain properties. Cate- 
gories which come from RCFT, integrable models or 
TQFT often enjoy additional properties. They are 
rigid — for each object X, there exists a dual object 
XY. They are ribbon (balanced) — there is a canonical 
endomorphism vy: X — X for each object X, which 
is related to the braiding. They are modular, which is 
defined as nondegeneracy of a certain matrix. The 
meaning of modularity is that the ribbon category is 
suitable for producing a TQFT out of it. 

For categories equivalent to the category of 
C x --- x C-modules, the ribbon (braided) monoidal 
structure can be specified by a finite number of complex 
matrices. For instance, 6j-symbols or q-6j-symbols 
encode the associativity isomorphism. In this form, 
modular categories appeared in the work of Moore and 
Seiberg (1989) on RCFTs. Such categories can be 


realized as categories of modules over weak Hopf 
algebras, but we stress again that the monoidal product 
for such modules does not coincide with the tensor 
product of vector spaces. So, general features are better 
seen at the level of category theory, and we now start 
with precise definitions. 


Rigid Monoidal Categories 


We recall here the basic definitions of monoidal 
categories, monoidal functors, and dual objects. 


Definition 1 A monoidal category (C, &, a, 1, L, r) is 
a category C, a functor ®:C x C—C (called the 
tensor product), a functorial isomorphism a: X & 
(Y $ Z) 5 (X & Y) @ Z, the associativity isomorph- 
ism, a unit object 1, and two functorial isomorph- 
isms 1: 1 @ X5 X,r:X &1— X such that 


Xe(Ye(ZeW)-S^(XeY)e(zeW)S((XeY)eZ)eW 
X&al 
X @((Y @Z) @ W) - 


Jas Ww 
(X@(Y@Z))@W 


commutes (the pentagon equation) and 


6 ry @Y 
ax iy = (xeüev) xev CeDey 


Definition 2 A monoidal functor (F, o, f) : (C, &) —^ 
(D, & ) is a functor F: C — D, a functorial isomorph- 
ism @= óx y: F(X) & F(Y) ^ F(X & Y) € D, and an 
isomorphism f : | — Fl € D such that 


1o 


FX @ (FY @ FZ) SS FX @ F(Y @Z) -5 F(X&(Y2Z)) 
al | Fa 

(FX @ FY) @ FZ ®© F(X9Y)9Z & F((X@Y) @Z) 

Fl @ FX + F(1@X) 


fei] | Fi, 
L@ FX _1_, FX 


FX @ F1—*> F(X@1) 


ief] |r 


FX @ 11, PA 


functors 
morphism 


commute. A morphism of monoidal 
A:(F,¢,f) -(G,v,g) is a functorial 
A:F— G such that 


EX @ FY “> F(X & Y) 
real b 
GX @ GY-+ G(X & Y) 
z-ü5nÀdgp 


The f datum of a monoidal functor (F,¢,f) is 
uniquely determined by the (F,@) data, so we can 
denote a monoidal functor as (F, $) or even F. 


The coherence theorem of Mac Lane (1963) states 
that any monoidal category C is equivalent to a 
strictly monoidal category, in which X & (Y ® Z)= 
(X @ Y)®Z,1@X=X=X Q 1, and the isomorph- 
isms a,/,r are identity isomorphisms. Thus, in 
theoretical constructions, one may ignore the associa- 
tivity isomorphism. It is not always so in practice. For 
instance, working with quasi-Hopf algebras related 
with the Knizhnik-Zamolodchikov equation one 
prefers to keep the original category, which is (a 
deformation of) the category of modules over a Lie 
algebra, rather than to replace it with a strict monoidal 
category, that is not a category of modules any more. 


Definition 3 A rigid category C is a monoidal 
category in which, to every object X €C, dual 
objects X" and "X € C are assigned together with 
morphisms of evaluation and coevaluation 


ex: X8X 5 1 2 X[ Jx" 
| "xx 
coevx : 15 XY & X — X"( )X 
coevy :1 ^ X8 "X 2 X( "X 


evy: X®X—1 


The evaluations and coevaluations are chosen such 
that the compositions 


evel 


1@X—+X 


1&coev 


X Xe1 EF XOX OX) (XOX Joxe 
x lax ixe "X)ex xo XOX) 23 Xo1-5x 
XOX ESX @ X)@XY 3 x e(Xex") 9*x'gi xv 


vol 955 vXxe(XeVX) 


r i 


vx 


a 


-J("XeX)e"x7 S 1eVX5"x 
are all identity morphisms. 
In a rigid monoidal category C, there is a pairing 


(X @ Y) @ (YY @ X") S (X & (Y & Y") 
@ XY XeeeX; (X @ 1) e X reX , Xox' 一 1] 


which induces an isomorphism j,x. y : YY @ XY > (X & 
Y)", such that the above paing coincides with 


(X & Y) @ (YY & XV) 2^ —5 (X & Y)e (XY) 5S 


The equation 
coev = — YY 9 Y c YY 
Xoy = G9 eleY 
igcoevx81 YY @ XV BX @Y 
el 1 
= (X@¥)"@(X@Y)) 


also holds. Similarly, there is an isomorphism 
j-xvy:'Ye "Xo" (XG Y 

Morphisms constructed from braidings and (co)- 
evaluations are often described by tangles. The 
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X 
A morphism [5X9 by i 
Y 
K. X Y 
The braiding Cx y :XBY—> YOX by x 
\ 
i im "m X Y 
The inverse braiding : XO Y—YGx by Y 
/ 
X XY 
The evaluation evx: XO XY —- 1 by y 
The coevaluation coevx :1 — XY GX by Là 
Figure 1 Conventions for notation of morphisms from 
tangles. 


conventions are listed in Figure 1. The suggested 
assignment of morphisms in C to elementary pictures 
extends to a unique functor ® from the category of 
C-colored tangles to the category C itself. With the 
above interpretation, these tangles need not be 
oriented. We shall use the same notation for framed 
tangles, and the framing will be within the plane. 
The maps ObC— ObC, X= XY, and X"X 
extend to contravariant self-equivalences C — C, 
f —f', and f —'f. For given f, the morphisms f* 
and 'f can be defined, respectively, by the following 
pictures using the assignment from Figure 1: 


Y 


A Xt 
"Y "Y 
X 
Y 
"X “A 


We have a monoidal self-equivalence of C, 
(j2): (C,®,1) > (C,@,1), X= X", f 5f" 


È Xe Y)" ’) Hi 


&xy- (xo yw 5, (YY 9 XV)"- 


It is not always true that the two duals X" and "X 
are isomorphic. However, there are canonical 
isomorphisms 


RY 
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We may replace the category C with an equivalent one, 
such that the above isomorphisms become identity 
morphisms, and the functors —" and "— are inverse to 
each other. We shall assume this to simplify notations. 
Finally, we denote the iterated duals by X"? = XV 
(n times) and X'-"") = YY X (n times) for n > 0. 


Braided Categories 


Here we review the definitions of the braiding 
isomorphism and further derived isomorphisms. Sev- 
eral basic relations between them are listed. Two 
important classes of examples of braided categories 
are given by the categories of modules over quasitrian- 
gular Hopf algebras and the categories of tangles. 


Definition 4 A braided category (C, c) is a monoidal 
category C equipped with a functorial isomorphism 
c=cxy:X@®Y—Y@xX - the braiding, or the 
commutativity isomorphism — such that the two 
hexagons commute, 


Xe(YeZ)1e KX @(Z@Y)-5(X@Z)@Y 

al | c*&1 

(X&eY)ez-ze(xevY)5(Zex)ev 
(one for c and one for c^). 


The graphical notation for the braiding and its 
inverse is 


X Y 
c=(cexy:X9Y>Y9X)= >< 
Y x 

X Y 

ME T ud 

A— ee 

Y X 


In a rigid braided category, we can define 
functorial isomorphisms using again the conventions 
from Figure 1: 


j N ‘ach 

a ke us 
d... , TW / 
O ERON 


uidit. d YNN 


These are isomorphisms of monoidal functors 
(see [1]) 


u? ida JN og) 


u^, : (Id, c^) — (—™,) 


In particular, this implies the commutativity of the 
diagram 


XoY  ,X@®Y 


uiou I 
Xv’ @ YY ER (X & yy" 
The square of the monoidal functor (—Y”,j2) is 


(一 一 有) : (C, 69, 1) Bk: (C, e», 1), 
Xia XN fro f"! 


where 
j4x Y m (eo 四 yvvvv h, (XY 四 yee LR (X ® y") 
2 


The natural isomorphism a = u? o u$ is, in fact, an 


isomorphism of monoidal functors uj}: (Id, id) — 
Gj) 


Ribbon Categories 


Now we define balancing and recall some properties 
of balanced (ribbon) categories. 


Definition $ Let C be a rigid braided category. 
A balancing Bx:X— X% is an isomorphism of 
monoidal functors (3: (Id, id, id) — YY, j2,d2) such 
that 8 =u% and B} = By): X’YY — XY. The cate- 
gory C equipped with a balancing is called 
balanced. 


We also use the notation uj =. In any balanced 
category, there exists a canonical ribbon twist v. 
A ribbon twist v=vy:X — X,v:Id— Id is a self- 
adjoint (vx: — v) automorphism of the identity 
functor such that c? x & vy') ovyay. It can be 
determined from the equations 

H)—wlo" =W op:X- X" 

p dag up os mg lop:X— X 
In particular, its square is given by the canonical 
isomorphism 1?—u,?ouwi. Conversely, in any 
rigid braided category with a ribbon twist (called 
ribbon category) there exists a canonical balan- 
cing ui given by the above formulas. Thus, ribbon 
categories and balanced categories are synonyms. 

In the case of X = 1, we have v; — id;. 


The following result can be used to simplify 
notations: 


Proposition 1 For any ribbon category C there exists 
a ribbon category D equivalent to C such that in it 


(i) 1"21; 
(ii) for any object X we have YX=XY, X" =X, 
and Bx -idx: X XY —— X. 
(iii) for any object X we have evx—ev,:X 9 
X" — 1, and coevx 2coev,,:1— X" & X. 


In the category C= H-mod, where H is a ribbon 
Hopf algebra, the equation XY —"X is not neces- 
sarily satisfied. Nevertheless, XY is canonically 
isomorphic to "X. The same holds in any ribbon 
category. We identify these objects via B=ué: 
"X X". This allows us to use the right dual 
objects in place of the left ones. In that role, the 
right duals are equipped with the left evaluation 
and coevaluation, called flipped evaluation and 
coevaluation, respectively: 


év:X @X X98 XY @XV— 1 


Coev :| _coev, XY @ XV 8'8X' X c XY 


They are often denoted simply ev and coev and 
should be replaced by ev and coev in applications. In 
the context of Hopf algebra, @ is given by the action 
of a group-like element introduced by Drinfeld. 


Hopf Algebras in Braided Categories 


Let C be a braided monoidal category. A Hopf 
algebra H in C is an object H € ObC together with 
an associative multiplication m:H & H — H and an 
associative comultiplication A: H — H & H, obeying 
the bialgebra axiom 


(HeH^HSHeH) 
-(HeH^*"HeHSH&H 
HSHHoOHGHGH 
mon, H & H) 


Moreover, H has a unit 7: 1 — H, a counit e: H — 1, 
an antipode 4:H — H, and the inverse antipode 
4 1:H — H. The defining relations for these are the 
same as in the classical case. Notice, in particular, 
that the unit is also a morphism. Associativity of 
multiplication, as well as coassociativity of comulti- 
plication, is formulated with the use of associativity 
isomorphism (in the nonstrict case). 

Hopf algebras in braided categories have also 
been called braided groups. Their basic properties 
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are very similar to those of usual Hopf algebras, for 
example, the antipode is antimultiplicative with 
respect to the braiding (see, e.g., Majid (1993)). 
For Hopf algebras in rigid braided categories, there 
exist integrals in a sense very much similar to the 
case of ordinary finite-dimensional Hopf algebras, 
as shown by Bespalov et al. (2000). 


Modular Categories 


Assume that a braided rigid monoidal category C is 
equivalent as a category (with monoidal structure 
ignored) to the category of finite-dimensional mod- 
ules over a finite-dimensional algebra. In particular, 
C is abelian. Then there exists an object F in C, 
equipped with a morphism ix : X & XY — F for each 
X € ObC, such that the diagram 


xor L yey 


Xeft | liv 
Xex' "* F 


is commutative for all morphisms f : X — Y of C, and, 
moreover, F is universal between objects with such 
properties. Here f': Y" — X" is the transpose of a 
morphism f : X — Y. In other words, F is a direct limit, 
called the coend and denoted as F= f 2° 7 & Z*. It 
can also be defined via an exact sequence 


X Q YY feY'-xef. Dzez “Fr 0 
f:X5Yec Zec 


It turns out that the coend F is a Hopf algebra in 
the braided category C, when it is equipped with the 
following operations. The comultiplication in F is 
uniquely determined by the equation 


(Xex' F5 FeF) 
= (X@X" =X@1@X’ 
X&coveX" X @ KY @ XK @ X" 
Axe, FOF) 
The counit in F is determined by the equation 
(xox F51) = (xe xv 51) 


The multiplication m:F & F— F is defined by the 
following diagram: 


X e Y yY XOX Q(Y QY“) xev FOF 
dim Ns and xec| | 
X@Y@(X@Y)” _ ixey , F 
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The unit is given by the morphism 
5:leIleil"-5rF 


The diagram corresponding to the 
yr: F — F is given by 


antipode 


VF = Thee 


The structure of the coend F as a Hopf algebra can 
also be found directly from its universal property, as 
in Majid (1993). 

There is a pairing of Hopf algebras w: F & F — 1 in C: 


F a F 


It induces a homomorphism of Hopf algebras F — F”. 


Definition 6 A ribbon category C, equivalent as 
a category to the category of finite-dimensional 
modules over a finite-dimensional algebra, is called 
modular if the pairing w is nondegenerate, that is, 
the induced morphism F — F” is invertible. 


Examples of nonsemisimple modular categories 
include C= H-mod, where H=w,(g) is a finite- 
dimensional algebra, quotient of the quantum 
universal enveloping algebra U,(g), and q is a root 
of unity of odd degree. In these examples, the 
coalgebra F identifies with the dual Hopf algebra 
H*, but the multiplication in F differs from that of 
H*. Explicit formula for the multiplication in F uses 
the R-matrix for H (see, e.g., Majid (1993)). 
A definition of modularity for another type of 
categories (not necessarily abelian) was given by 
Turaev (1994). 

When the category C is modular, the integrals for 
the Hopf algebra F have especially simple properties. 
The integral element in F is two sided. It is a 
morphism jz: 1 — F such that 


(F= rei tre rar) 
= (F 1SF) 


u81 


- (P= 1er repr) 


and u is universal between morphisms with such 
property. By duality, the integral functional A: F— 1 
is also two sided. Ir satisfies 


(F>FeFFel=F) 
= (pir) 
= (F--F@F19F=F) 


and is universal between morphisms with such property. 
The integral element and the integral functional are 
unique up to a multiplication by an element of Aut; 1. 


Semisimple Abelian Modular Categories 


Reshetikhin and Turaev proposed to construct invari- 
ants of 3-manifolds via quantum groups. More 
precisely, they use certain abelian semisimple ribbon 
categories obtained from quantum groups at roots of 
unity as trace quotients. One can forget about the origin 
of these categories and work simply with semisimple 
modular categories. We shall describe them as input 
data for the modular functor construction. 

Let C be a C-linear abelian semisimple modular 
ribbon category. Assume that the number of 
isomorphism classes of simple objects is finite. 
Assume also that l is simple and for each simple 
object X the endomorphism algebra End X =C. We 
denote by S={X;}; the list of (representatives of 
isomorphism classes of) all simple objects. 

Under these assumptions, many formulas simplify. 
The coend F € C takes the form 


F-(Dxex'ec 
XES 


Any morphism 1 — F is a C-linear combination of the 
standard morphisms for X € S, 


125 xe’ X— xox’ SF 


- 9 
uu 
1 Du 


The morphisms óx form a basis of the commu- 
tative algebra Inv F= Home(1,F). The Grothen- 
dieck ring of the category C determines the 
multiplication law in Inv F via the algebra 
isomorphism C &z Ko(C) — Inv F, [X] — óx. 

Any morphism F— l can be represented as a 
linear combination of the morphisms 


Uy:F'5Xeox'95] 


where X € S. The functional v; :F— 1 satisfies the 
properties of a two-sided integral A of the braided 
Hopf algebra F. 


The Verlinde Formula 


The number 


X X 
dim,(X) = E 


coev 18u? ev 
: 1— X8 X— X @ xX 1 
is called the dimension of an object X € ObC. (The 
index q reminds us that this number coincides with 
the q-dimension in the case C=U,(g)-mod.) We 
have dim, (X") = dim, (X). 


Definition 7 Introduce a biadditive function of two 
variables s: ObC x ObC — C on the class of objects of C: 


In particular, its restriction to S is a matrix s|s:S x 
S — C, denoted again by s=(sxy)x yes by abuse of 
notation; here X and Y run over simple objects. 


Notice that Syy = syx, so the matrix S is symmetric. 
Let us consider the C-algebra Inv F = Home(1, F). It has 
the basis dy, X € S; hence, it is n-dimensional, where 
1 — Card S. The form w on F induces a bilinear form 

w' : Inv F x Inv F—+ Hom(1, F & F) Hem 1 


The matrix (xy) is the matrix of the form w in the 


basis (dx). 


Lemma 1 (The Verlinde formula) -For any simple 


X € 8 and any objects Y and Z of C, we bave 
Sx} = dim,(X), Sx1Sx,Yaz =SxySxz [2] 


Proof The first formula is straightforward. Since 


e End YX =C 


is a number, we can move it from the second factor 
to the first in the following computation: 
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SX1SX YZ 


= SXY5XZ 


This proves the second formula. o 


Proposition 2 (Criterion of modularity) In the 
above assumption of semisimplicity, the following 
conditions are equivalent: 


(i) C is modular (w is nondegenerate); 

(ii) the matrix (Sxy)x yes is nondegenerate; 

(iii) for any X € S its dimension dim, X does not 
vanish, and there exist numbers uy, Y € S, such 
that for all X € S we have > ys Sxypy = 6x1; and 

(iv) for each simple | X41 we have 
$ ves SXY dim, Y — 0 and dim, X A 0. 


The easy implication (ii) => (iii) can be deduced 
from the Verlinde formula. If the dimension 
dim, (X) —sxi of a simple object X vanishes, then 
s&y-—0 for all Y € ObC. This contradicts to the 
assumption of nondegeneracy of (sxy). 

Let us determine the coefficients py of the integral 
element 


w= M ^ uyóy (lE 
Yes 
of the Hopf algebra F. It also has a two-sided 
integral-functional A:F— 1. The corresponding 
endomorphism is 
óZ 


= (z= FazS1ezZ——Z) 
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for an arbitrary object Z of C, where óz is the 
natural coaction. The equation 


X X" 


y r 


x X" 
by ¢y 


(S 


-Óyy 


Y Y' 


follows from the properties of the two-sided integral 
A of the Hopf algebra F. Due to uniqueness of 
integrals, À is proportional to 7 . In eqn [3], X and 
Y vary over S. The right-hand side is the identity 
morphism if X — Y, and vanishes otherwise. Sub- 
stituting the definition of y, we rewrite the 
equation as follows: 


A © X" Y 


Hy = Oxy [4] 


AY AU Y 


For X — 1, we get 


uy - Ay = bly -idy : YO Y [5] 
If Y 4 1, then Ay — 0. So [5] tells essentially that 
jy Ay = id) : 1S 1 [6] 


Now return to [4] with X — Y. If we compose that 
equation with coev: 1 — Y" & Y, we obtain 


Multiplying both sides of [7] with jjj, we find 
py = p : dim,(Y) 
The normalization is fixed by eqn [6], which we can 


write as 


Y" Y 


u 
1 = py: = f] So m u2? 
YES 


= p > (dim; (Y) 


Yes 


Hence, 


(mi = (X dam [8| 


Yes 


So, we find jjj, unique up to a sign. 


Conjugation Properties 


From the Verlinde formula |2], we conclude that 


the commutative  C-algebra Inv F possesses 
homomorphisms 
xx : Inv F > C 


pyr (dim,(X))'sxy = Sxy/Sx1 


The matrix S is invertible, so that its columns cannot 
be proportional. Hence, all yx are different char- 
acters. Their number is n = Card S = dimc F; hence, 
there is an isomorphism of C-algebras 


x : Inv F>Cx---x C=C" 
db (xilo), ---XalQ)) 


Now we show that the dimensions dim, (Y) are 
real numbers, so that 44; is also a real number. One 
can introduce in Inv F an antilinear involution, 


—* : Inv F —^ InvF, (gx) = dy 
and a scalar (Hermitian) product 
(óx|óv) = bxy, X,YES 


Then Inv F becomes a finite-dimensional commu- 
tative Hilbert algebra. Indeed, 


(dx¢y|bz) = dim Hom(X & Y, Z) 
= dim Hom(X, Y" & Z) = (¢x|¢)¢z) 


From the theory of finite-dimensional commutative 
Hilbert algebras, we know that idempotents in the 
algebra Inv F are self-adjoint (only in that case the 
scalar product can be positive definite). Hence, x is 
a *-morphism, that is, xx(ó*) — xx($). Therefore, 


Sxyv/5x1 =Sxy/Sxi- In the particular case of X — 1, 
we obtain 


dim;(Y) = dim;(Y") = s;yv =Siy = dim,(Y) 


since Sj; —1. This proves that for any Y €C its 
dimension dim, (Y) is a real number. 

It is natural to take for jjj the positive root of the 
right-hand side of [8]. Positiveness fixes jj; uniquely. 


Examples of Semisimple Modular Categories 


In their original paper, Reshetikhin and Turaev 
(1991) use as algebraic input data the representation 
theory of the quantum deformation U = U;(sl;) of 
the Lie algebra sl(2, C), where q is a root of unity. 
They construct the invariant as a trace over 
U-equivariant morphisms, and prove the necessary 
modularity condition concerning the nondegeneracy 
of the braided pairing. 

The general picture is drawn by Turaev (1994), 
where 3-manifold invariants and TQFTs are con- 
structed from semisimple modular categories. He 
shows how to obtain the latter as quotients of 
certain subcategories of representations of a modu- 
lar Hopf algebra by the ideal of trace-negligible 
morphisms. 

Finkelberg (1996), based on results of Gelfand 
and Kazhdan, establishes (via the theory of Kazhdan 
and Lusztig) an equivalence between two modular 
categories. The first is the semisimple category C of 
integrable modules over an affine Lie algebra g of 
positive integer level k. The second is a certain 
subquotient of the category of U,(g)-modules for 
q — exp(rim 1 /(k -- b")), where m € {1,2,3} and bY 
is the dual Coxeter number of g. Huang and 
Lepowsky (1999) describe the rigid braided struc- 
ture of C using vertex operators. Bakalov and 
Kirillov (2001) use geometrical constructions to 
make C into a modular category, associated with 
the Wess-Zumino-Witten (WZW) model. They 
construct the corresponding WZW/ modular functor. 


Modular Functor and TQFT 


Modular categories give rise to a modular functor 
and a TQFT. The meanings of those differ from 
author to author, but the common features are the 
following. Such a TQFT is a functor from the 
category whose objects are smooth surfaces with 
additional structures and morphisms are three- 
dimensional manifolds with additional structures to 
the category of vector spaces. A modular functor is 
the restriction of such TQFT to the subcategory whose 
morphisms are homeomorphisms of surfaces. One of 
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the constructions due to Kerler and Lyubashenko 
(2001) takes a nonsemisimple modular category as an 
input and assigns to it a double TQFT functor, that is, 
a functor between double categories. The target is the 
2-category of abelian categories. 


See also: Axiomatic Approach to Topological Quantum 
Field Theory; Hopf Algebras and q-Deformation Quantum 
Groups; The Jones Polynomial; Knot Invariants and 
Quantum Gravity; Quantum 3-Manifold Invariants; 
Symmetries in Quantum Field Theory of Lower 
Spacetime Dimensions; Topological Quantum Field 
Theory: Overview; von Neumann Algebras: Introduction, 
Modular Theory, and Classification Theory; von 
Neumann Algebras: Subfactor Theory. 
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Introduction 


Branes appear in string theories and M-theory as 
extended objects which contain some nonperturba- 
tive information about the theory, and, apart from 
gravity, they can couple with gauge fields. 

At low energies, M-theory can be approximated 
with an 11-dimensional N — 1 supergravity, which in 
fact is unique and contains a graviton field (the metric 
guv), a spin 3/2 field (the gravitino) and a gauge field 
consisting of a 3-form potential field c. The gauge 
field, whose field strength is a 4-form G — dc, can then 
couple electrically with two-dimensional extended 
objects, called M2 membranes. Moving in spacetime, 
an M2 membrane describes a three-dimensional world 
volume W3 so that its coupling to the gauge field is 


$5 T C [1] 
W3 
k representing the charge. 

With c we can associate a dual field ¢ such that 
dc—*G. It is a 6-form and can then electrically 
couple with a five-dimensional object, the M5 
membrane. However, as c is the true field, we say 
that M5 couples magnetically with c. 

In superstring theories, which however are related 
to M-theory by a dualities web, there are many 
more objects to be considered. In particular, we will 
consider type II strings, which at low energies are 
described by ten-dimensional N=2 supergravity 
theories. They contain a Neveu-Schwarz sector 
consisting of a graviton g,,, a 2-form potential 
Bv, and a scalar field ġ, the dilaton. The content of 
the Ramond—Ramond fields depends on the chirality 
of the supercharges. 

Type IIA strings are nonchiral (their left and right 
supercharges having opposite chiralities) and con- 
tain only odd-dimensional p-form potentials A"), 
with p — 1,3,5,7,9. . 

Type IIB strings are chiral and contain only 
even-dimensional p-form potentials A), with 
p—0,2,4,6,8. 

Proceeding as before, we see that a (p + 1)-form 
potential can couple electrically with a p-dimensional 
object and magnetically with a (6 — p)-dimensional 
object. Such objects in fact exist in type II strings: the 
Dp branes are p-dimensional extended objects, with 
p —0,2,4,6,8 for IIA strings and p= —1,1,3,5,7,9 
for IIB strings. In particular, DO and D1 branes are 
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called D-particles and D-strings respectively, whereas 
D(—1) branes are instantons, that is, points in 
spacetime. Concretely, D-branes are extended regions 
in spacetime where the endpoints of open strings are 
constrained to live. Mathematically, they are defined 
imposing Dirichlet conditions (whence the *D" of 
D-brane) on the ends of the string, along certain 
spatial directions. Excitation of these string states 
gives rise to the dynamic of the brane. They 
correspond to a ten-dimensional U(1) gauge field, 
whose components, which are tangent to the brane 
world volume, give rise to a gauge field in p+ 1 
dimensions, whereas the orthogonal components 
generate deformations of the brane shape. Moreover, 
if n parallel p-branes overlap, the gauge theory on the 
world volume is enhanced to a U(z) gauge theory. 
Closed strings can generate gravitational interactions 
responsible for wrappings of the brane. However, in 
the cases when gravitational interaction is negligible, 
we can use this mechanism to construct (p + 1)- 
dimensional gauge theories, as we will see. 

Before explaining how the construction works let 
us remember that there are two other interesting 
objects which often appear. In fact, we have not yet 
considered the Neveu-Schwarz B-field: this field can 
couple electrically with a one-dimensional object 
and magnetically with a five-dimensional object. 
These are the usual string (also called a fundamental 
or F-string) and a five-dimensional membrane called 
NSS brane. 

We will see how supersymmetric gauge theory 
configurations can be realized geometrically, con- 
sidering more or less simple configurations of 
branes. We will also show that quantum corrections, 
be they exact or perturbative, can be described in 
this geometrical fashion. To be explicit, we will 
work with four-dimensional gauge theories, but it is 
clear that similar constructions can be done in 
different dimensions. 


Gauge Groups on the Branes 


A deeper understanding of how D-branes and 
related world-volume gauge theories work requires 
the introduction of dualities, but a quite simple 
heuristic argument can be given, giving up some 
rigor in favor of intuition. 

To set our ideas, let us think of an open string 
moving in a nearly flat (but ten-dimensional) space- 
time. Its trajectory will describe a two-dimensional 
surface having a boundary traced by the ends of the 
string (Figure 1). The string can then be described by 
a map from a two-dimensional surface X, having a 


Closed string 


Open string 
Figure 1 Strings moving in spacetime. 


boundary 4-— X, to spacetime, say X“(o,7) with 
1.—0,1,...,9. Here we chose on X local coordi- 
nates o^ =(0,T), where c € [0,7] is a spacelike 
coordinate and 7 is a timelike one. Then o=0,7 
individuate the ends of the string and are identi- 
fied for the closed string. Now, on a given back- 
ground, the string evolution is usually described as a 
two-dimensional (supersymmetric) conformal field 
theory for the fields X^(c,7). The action for the 
bosonic part is the same for both type IIA and IIB 
strings, and reads 


] c aß OX" OX" 
4ra’ [ veh gl) aoa ðo” 


i af OX"OX" |. og 
is], Po s n do A do’ [2] 


where g,, and B are the metric and a 2-form 
potential field for the given spacetime background, 
and b,5 is a metric for X. In general, we must also 
add a scalar field @(X), but it will not play any role 
here. Using conformal invariance, we can reduce hag 
to the flat metric. Also consider a flat background 
£u, (X) =» and concentrate for a moment on the 
B-field. 

Conceived as a 2-form field over the spacetime, 
the potential field B is a gauge field: its field strength 
3-form H — dB is unchanged under a shift 


B — B 4- dA [3] 


generated by the 1-form field A(X). Here A should be 
a totally unphysical field. However, note that if one 
considers open strings, the action for the B-field, and 
then the full action is shifted by a boundary term 


S[X] = 


1 OXR , . 
S[X] — S[X] 十 ul A,(X) ado [4] 
The boundary y just describes the timelike world 
lines of the ends of the string. Thus, the ends of 
the string carry a U(1) charge and, even though 
the B-field vanishes, we can have the open-string 
action 


S[X] = n» [ aX" X,d” 


" / A, (X)8, X" do^ 5] 
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Here we conventionally rescaled the A field to 
normalize the action. To define the equation of 
motion, however, we must also specify boundary 
conditions for X"(c,7) on y. Let us choose Neu- 
mann conditions for 4,—0,1,...,p and Dirichlet 
conditions for the remaining directions 


0,X*(y) 20, a=0,...,p [6 


0,X (y) 20, i=p+1,...,9 [7] 


This means that the extrema of the string are bound 
on a (p + 1)-dimensional region (including time): the 
Dp brane. If for © we consider the full strip 
(c, T) =[0, 7] x R then the U(1) action reduces to 


Sala] = i: AaOr X" (1, T) 


& f ^ A8,X*(0,7) [8] 


Thus, only the components of A, tangent to the 
brane interact with the ends of the strings. What 
about the normal components A;? 

To understand its meaning, let us proceed to 
compute the mean momentum transferred by the 
string, as it would be rigid. Imitating the Hamilton- 
Jacobi procedures for particles, let us consider the 
action up to a fixed time, say t=O, so that 
3—[0,7] x [706,0]. It is then a function of the 
position X"(c, 0) of the string at the instant 7 — 0. 
To compute the momentum, we must vary the 
action by changing the position by a constant shift 
6X"(c) — Ab. The variation will then contain some 
boundary terms which, for reasons of consistency, 
we must make vanish. 

Before doing such a computation, let us make 
some further comments. It is plausible to assume 
that the two ends of the string could be charged for 
different U(1) fields. To the states of the open string 
we can in fact add two discrete labels I, ] — 1,...,71, 
for some integer n, called Chan-Paton factors, and 
referring, respectively, to the two ends of the string. 
We will indicate the ends of the string as X"(0, 7; I) 
and X"(z, 7; J) when we need to specify the states. If 
the string is in the excited state (I, J), then X(0, 7; I) 
can couple with the field A! and X(z,7;]) with AU, 
For simplicity, we will now assume that these fields 
are constant. Note however that A! must be 
intended as a function of X(0, 7) only, and similarly 
for A). Also to realize the variation we can vary 
X"(c,T) by a function óX"(c,7)— A"(7) strictly 
picked to Af at 7 —O so that essentially 


0, A" (r) = Apó(T) [9] 


where 6(7) is the Dirac delta function. 
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Using the chosen boundary conditions, the varia- 
tion of the full action contains the boundary terms 


0 
Spound = (Af? = AP?) { ð-A'(r)dr 


1 HPF 
+z A 0, Xi(o, 0)de 


A 
= Fra [xic 0) 一 X;(0, 0) 


+2ra (Aj — A [10] 


Imposing the condition of its vanishing gives the 
physical interpretation for the normal components 


of the U(1) fields 
X,(n,0) - X,(0,0) = -2ra (AP - AP) — (t 


This means that, up to a constant shift, the fields 
AP measure the positions of the ends of the strings 
in the transverse directions! (Figure 2). Equivalently, 
we can say that the string ends on two different Dp 
branes, parallel but displaced in the transverse 
directions by a quantity 一 2TQ/ (AU = ary. We are 
thus also able to interpret the Chan—Paton factors. 
They mean that the string is living in a background 
of n parallel branes, stretched between the Ith and 
the Jth brane. On every brane, a U(1) gauge group 
lives so that the full gauge group is U(1)". However, 
when £ of the branes overlap, the corresponding set 
of states become indistinguishable, so that the gauge 
group can be enhanced to a U(k) group. In 
conclusion, 7 overlapping parallel Dp branes carry 
a (p--1)-dimensional U(z) gauge theory which 
breaks in U(k;) block factors if the branes separate 
in stacks of k; overlapping branes. 

We can say a little bit more about this. If the 
string excited states represent gauge degree of 
freedom, they must become massive to break gauge 
symmetry when the branes separate. To see this, let 
us conclude by computing the mean momentum 
carried by the string. After elimination of the 


Figure 2 Tangential components of A4 appear as gauge 
modes. Normal components A; appear as shift modes. 


boundary terms, the total variation of the action 
due to the shift 6X^"(o, 0) = A" becomes 
1 


6$ 一 = 一 一 0. A" 0, X do? 
27a! yi 


— € : 0 X O 0 dc 12 
2 TO / | T " , ) | | 
[he resulting momentum iS 


1 T 
P, = zz] OA um, 0)do 


On the bulk, the fields X^ satisfy the standard wave 
equation in two dimensions, so that the general 
solution is the sum of a left-moving and a right- 
moving part,  X"(oc,T)— Xj (r +0) + X&(r — o). 
Imposing the boundary conditions, one finds 


X"(c,T) —- Xt (T +0) + Xz(r — c) 
+ 2na'p*r + Xo [13] 


Xo T) «Xi (r -- 0) — X1 (r — o) 
+ 2o' CU 一 AU Jg +X [14 


Here Xh and p^ are integration constants and 
Xi (7 +7) — Xi(r —7)—-0. A direct computation 
then shows that P^ — p^ and P'—0, which is also 
what intuition suggests: the string can freely move 
along the branes but is fixed between them in the 
orthogonal directions. However, if it is stretched 
between two separated branes (i.e., if I Z J), there is 
another contribution to the energy. In fact the factor 
T :=1/(27a’) represents the string tension, so that if 
A is its minimal length, its minimal contribution to 
the energy will be óE— TA. This energy must 
equally contribute to the spectrum of the excited 
modes, the gauge field bosons. Here in fact, is where 
T-duality comes into play, but we will not discuss it. 

The conclusion is that the spectrum corresponding to 
the stretched string must satisfy the condition E > TA, 
which is as if the string states acquired a mass TA, 
that is, 


dit. 3 (av - ADi) [15] 


i=p+1 


This gives us a geometric tool to construct (p + 1)- 
dimensional gauge theories: on n coincident Dp 
branes there exists a U(z) gauge theory which can be 
broken separating the branes and thus giving a mass 
to the gauge bosons. Such a mass is proportional to 
the distance between the branes (Figure 3). 

Before continuing with some examples, let us 
make two comments. First, the theory obtained in 
this way is a supersymmetric one, because the 


Massless 


A 


Figure 3 Stretched strings acquire a mass. 


Dirichlet conditions allow the action of supersym- 
metric transformations of the form ej Qr + en On, 
where OIL and OR are the fermionic left and right 
supercharge operators and «i, eg are spinors satisfy- 
ing the brane projection condition el = +T°T!-...- 
I?eg. Here I" are the ten-dimensional Dirac 
matrices and one refers to “antibranes” for the 
negative sign. 

Second, the gauge group can be converted into an 
SO(n) or an Sp(m/2) (for even n), adding an 
orientifold plane parallel to the branes. The orienti- 
fold plane acts on the orthogonal spacetime direc- 
tions with a Z»-action 


X ex [16] 


if X'=0 is the position of the orientifold. It further 
acts on the string world sheet as o ~ 7 — o making it 
an unoriented string. The effect is to project out 
some states from the spectra, thus reducing the 


gauge group. 


Geometric Engineering of Gauge 
Theories from Branes 


To illustrate how brane construction. of gauge 
theories works, we will consider a particular con- 
figuration of branes (Witten 1997). 

We would like to obtain a four-dimensional U(z) 
gauge theory. A possibility could be to take n D3 
branes in a type IIB string background. However, 
such a model would contain too many supersymme- 
tries: in ten dimensions, supersymmetries are gener- 
ated by two 16-dimensional chiral spinors €L, eg 
(?.....T?eq r= er). From the four-dimensional 
point of view, each of them represents four four- 
dimensional spinors giving an N — 8 supersymmetric 
theory. The projection condition, due to the branes, 
reduces the number of supersymmetries to four. 
Supersymmetry not being manifest in nature, it is 
desirable to have fewer supersymmetric gauge theo- 
ries at hand. Because different brane projection 
conditions can further reduce supersymmetry, we 
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Figure 4 D4 branes ending on an NS5 brane. Gauge degrees 
of freedom are frozen in four dimensions. 


can try to consider the coexistence of more kinds of 
branes. 

One way to do this is to consider n parallel 4-branes 
ending on an NSS brane in type IIA string theory 
(Figure 4), and then analyze the gauge theory restricted 
to the four-dimensional intersection (here the theory is 
nonchiral as T°-...- Te, jg = cei g). What kind of 
branes can end on other kind of branes can be 
established, starting from the fact that strings can end 
on a brane, and using the dualities tool (Giveon and 
Kutasov 1999). 

Let us fix some conventions. We will indicate with 
x = (x9,x!, x2, x?) € RÍ the coordinates on the inter- 
section, so that (x; v) = (x; x^, x?) € R^ define the NSS 
brane, and (x, x9), with x € [0, oc), the 4-branes. Also 
v; will indicate the position of the Ith 4-brane on the 5- 
brane, and y= (x^, x*, x?) will collect the remaining 
coordinates. Finally, we will indicate the product of 工 - 
matrices, corresponding to given directions, indicizing 
a simple [ with the respective coordinates. For 
example T"—I^I?. With these conventions, the 
brane projection conditions for D4 and NSS branes, 
respectively, read 


eL  I*T Pep [17] 


EL = [le ; ER = L-T n [18] 


These projections reduce supersymmetry to N — 2. 
After a short manipulation and using for example 
antichirality of eg, it is easy to see that the first 
condition can be substituted by 


EL = TI e, [19] 


In other words, we could add a number of 6-branes 
in the (x,y) directions, without further reducing 
supersymmetry. We will consider this possibility 
later. 

On the D4 branes there is an eventually broken 
U(m) gauge theory. Here the vector fields 
Aj, 1=0,1,2,3,6, and the scalar fields v; and y 
live. The last ones are set to zero by the Dirichlet 
conditions, whereas v; measure the fluctuations of 
the D3 brane positions over NS5. The O(2) group 
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of rotations of the (x*,x?) coordinates acts on 


them, which can be broken by an expectation 
value (vj) #0. The SO(3) rotations of (x$,x, x?) 
(under which v; are singulets) do not influence the 
projection conditions and can then be identified with 
the R-symmetry group SU(2)r. It could be broken by a 
nonvanishing expectation value (y) #0, but as we 
said it cannot happen in the actual configuration. This 
highlights an unbroken supersymmetric Coulomb 
branch. 

What is the physics as seen by an observer living 
on the four-dimensional spacetime x? The compo- 
nents Aa, a=0, 1,2, 3, of the vector fields transform 
as vectors with respect to the four-dimensional 
Lorentz group SO(1, 3). They satisfy Neumann 
boundary conditions on xê — 0 and then survive as 
U(z) gauge vector fields. The Ag component behaves 
as a scalar with respect to SO(1, 3) but is eliminated 
by a Dirichlet condition in x* — 0. The v scalar field 
will be responsible for the eventual breaking of the 
gauge group. 

This seems to be quite a good scenario but 
actually the situation is unsatisfactory. If a 4-brane 
extends to the interval [0, L] in the x? direction, the 
effective action for the gauge fields goes like this: 


L 
= | dx$ | d'xuF,,F"" 
0 R* 


oV TENE 0, 
m 


L 
8b, R^ 


where a, 2—0,1,2,3. Thus, the gauge coupling in 
four dimensions appears to be g4 = (gp,)/VL. In our 
case, where L goes to infinity, the gauge coupling 
vanishes and the gauge degrees of freedom are 
frozen. Moreover, an argument similar to the one 
made for the stretched strings shows that the energy 
of the D4 brane is very high and makes the 
mechanism of gauge group breaking difficult. The 
same is true for the NSS brane, which also turns out 
to be extremely massive and does not participate in 
the dynamics. But this is what we want. 

To solve the problem and restore gauge dynamics 
in four dimensions, one must consider a stack of 
4-branes of finite length in the x^ direction. This can 
be achieved placing in x? =L a second NSS brane 
parallel to the first one and in the same point in y 
(Figure 5). In this way, the D4 branes can stretch 
between the NSS branes. If L is little enough, the 
gauge dynamics is restored also requiring a small 
value for gp,, to ensure the gravitational coupling 
(and the couplings with the Kaluza-Klein and NSS 
modes) to be negligible. However, L must be bigger 
then the 6X° fluctuations in order to avoid quantum 
corrections. 


dxtrF, F^? [20] 


Figure 5 N=2 four-dimensional super Yang-Mills theory, with 
U(n) gauge group. 


What we just obtained is an N=2 supersym- 
metric classical U(m) gauge theory in four dimen- 
sions, without matter, and in the Coulomb branch. 
Before considering quantization, let us briefly 
discuss some possible generalizations. For example, 
matter can be realized attaching to the left-hand side 
NSS brane, new D4 branes parallel to the previous 
ones, but extended in the xê direction from —oo to 0 
(Figure 6). Considering strings stretched between 
long and short branes, we obtain states whose half- 
gauge action, associated with the end connected to 
the long brane, is frozen. The corresponding states 
thus appear in the fundamental representation and 
can be interpreted as matter states. 

To consider the Higgs branch, one should be able 
to break supersymmetry giving an expectation value 
to y. As mentioned above, in the actual configura- 
tion this cannot happen because y is set to 0 by 
Dirichlet conditions. Fortunately, as we said, one 
can add 6-branes in the (x, y) directions. If we insert 
such branes to stop the long D4 branes in a large but 
finite value of xê, say xê — —M with M > L, then 
long branes have Neumann conditions in the y 
directions. Thus, fluctuations of the long branes can 
give an expectation value to y, breaking super- 
symmetry and subsequently the Higgs branch can be 
tuned, shifting 4-branes stretched between 6-branes 
(Figure 7). 


Figure 6 Adding matter. 


branch 


Figure 7 Permitting Higgs phases. 


The details require some careful inspection, but 
we shall stop our analysis here (Giveon and Kutasov 
1999). 

More general gauge configurations can be realized 
by adding more parallel NS5 branes, and thus 
obtaining product groups. Adding orientifold planes, 
one can change gauge groups as explained in the 
previous section (Figure 8). 

Finally, we can take a further step towards more 
physical models, constructing N — 1 gauge theories. 
For example, this can be achieved from the previous 
N — 2. model, rotating the second NS5 brane from 
the (x,v) position, to the (x,w) position, where 
w= (x?, x?) (Figure 9). Then a new brane projection 
condition appears (e; —I"*T"eg), breaking super- 
symmetry down to N — 1. 

In this case, one could also obtain chiral matter, 
adding, for example, orientifold planes. 


Quantum Corrections from M-Theory 


Up to this point we have considered classical gauge 
configurations. Quantum corrections could be com- 
puted switching on brane fluctuations. However, it 
is an amusing fact that working with M-theory one 
can obtain exact quantum results. As an example, 
let us sketch how the exact Seiberg- Witten solution 
can be obtained for the N — 2 model described in the 
previous section, in the simplest case without 
matter. 


Brane Construction of Gauge Theories 365 


The full web of dualities suggests the existence of 
a unique unifying theory called M-theory. At low 
energies, M-theory appears as the strong-coupling 
limit of type IIA strings. In such a limit, DO branes 
become the dominant objects and the corresponding 
states can be interpreted as Kaluza-Klein modes 
coming from an eleventh dimension x!° compacti- 
fied on a circle S! (Figure 10). 

Thus, M-theory manifests itself as an 11-dimensional 
supergravity. In particular, it can be shown that there 
can be only a unique 11-dimensional supergravity. As 
said, here the nonperturbative objects are two- or five- 
dimensional membranes. 

From the M-theory point of view, the D4 branes 
considered in our model appear as M5 membranes 
wrapped on the eleventh direction S$! (Figure 11). 
Because quantum corrections are no longer negligi- 
ble, we can no longer think of these branes as 
stretched in the x direction, but v must also be 
considered. Thus, the M5 membranes will describe, 
in RP x S!, a region R x S, where R* are the x 
coordinates, and $ is a Riemann surface immersed in 
Q x LO being spanned by the (v, x*) coordinates. 
In fact, supersymmetry constrains the surface to be a 
holomorphic curve, so that to describe it, it is 


convenient to collect v=(x*,x*) and (x$,x!?) into 


complex coordinates v= xf + ix? and s — x$ + ix". 
To compute quantum fluctuations, let us note that 
the end of a D4 brane over an NSS brane is free to 
move along the v directions. A fully free end of a 
brane would satisfy a free wave equation. However, 
as xê is constrained in all directions but the v ones, it 
will simply satisfy a Laplace equation in two 
dimensions: A,X* =0. Let us solve it, for a fixed 
NSS brane. It will be (at least for large values of v) 


n, Hg 


x*(v) =k} "log|u vi; | - ky ,loglv —vg;| [21] 
i=1 Ez] 


where nr is the number of D4 branes ending on 


the left-hand side of the NSS brane, in the positions 


v\*), and similar for the R index, which refers to 


Figure 8 N=2 four-dimensional super Yang-Mills theory with U(ni) x U(no) gauge group and matter. Strings crossing the central 


NS5 brane give matter in the (m, n») representation. 
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V i 
x10 


Figure 10 In M-theory one can think as if at any ten-dimensional 
spacetime point, there is attached an S' circle of ray Fio. 


D4 brane M5 membrane 


(v. y) 


Figure 11 D4 branes become M5 membranes in M-theory. 


the right-hand side. Here (a) refers to the ath NSS 
brane, and k is an integration constant. 

Because x is the real part of a holomorphic field, 
whose imaginary part is compactified on a circle of 
ray Ryo, we then find 


nj, l 
s(v) = Rio i log (v 一 29) 
i=] 
NR 


- Rio X` log (v = vf) [22] 
i=1 


This describes the quantum fluctuations of the NSS 
brane as seen in M-theory. In particular, because of 
the imaginary part of s, the ends of the D4 branes 
appear as vortices on the NSS brane. In place of s, it 
is now convenient to introduce a new field 
t:= exp (—s/R10) so that , 


= ITs: (v = 29) 
t(v) — IE (vo?) [23] 


Before continuing, let us look a bit again at the 
classical limit. In this case, a fixed value of v will 
correspond to the position of a D4 brane, whereas a 
fixed value of s will correspond to the fixed position 
of an NSS brane. The classical configuration is then 


(s-s) (s—s®) (v — vj) 20 [24] 


Here s^ are the positions of the NSS branes, and 
the positions v; of the D4 branes coincide for both 
the NSS branes. Also, for large values of v, one has 
t zs y" and t?) eu, 

Quantum mechanically, the configuration is 
determined in terms of v and t by the holomorphic 
curve $, which can be described as an algebraic 
curve F(v, t) — 0, generalizing the classical configura- 
tion. As there are two NSS branes and n D4 branes, 
F must be a polynomial of degree 2 in t, 


F(v,t) = A2(v)t? + Ai (v)t + Ao(v) [25] 


where Az, a= 1,2,3, are all polynomials of degree n. 
Note that values of v such that A, vanishes give the 
solution t = 0, which corresponds to sending the right- 
hand side NSS brane to oc. Similarly, A; = 0 sends the 
other NSS brane to 一 co. To avoid these undesirable 
configurations, we can set Ao — A; — 1. For Aj, we 
can take the most general choice, up to an eventual 
shift in v, giving the quantum configuration 


这 十 [v" + a, 2U" 7 4 --- E aav + ag]t - 1 =f [26] 


This realizes a quantum-mechanical correspondence 
between the M5 membrane configurations described 
by the given polynomials, and the N=2 super 
Yang-Mills vacua. But this is also the claimed 
Seiberg—Witten curve. In particular, M-theory gives 
a concrete physical meaning for the support Rie- 
mann surfaces of the Seiberg-Witten solutions. 

To conclude, let us make some further comments. 
It is clear how the construction can be extended for 
involving more configurations, for example, with 
more NSS branes, or adding matter. 

Also, we have seen that the geometrical picture 
which branes give of gauge theories extends at the 
quantum level. 

A similar construction can be made for the N= 1 
model, which also permits a full geometrical proof 
of the Seiberg duality at both classical and quantum 
levels. 

Finally, we should note that there are also 
other methods, which work in spacetimes where extra 
dimensions are compactified. There, the branes wrap 
around certain singular loci which contain information 
about gauge symmetries (Lerche 1997). 


See also: AdS/CFT Correspondence; Compactification of 
Superstring Theory; Gauge Theories from Strings; 
Noncommutative Geometry from Strings; Seiberg—Witten 
Theory; Supergravity; Superstring Theories; 
Supersymmetric Particle Models. 
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Introduction 


At high enough energies, Einstein's classical theory 
of general relativity breaks down, and will be 
superseded by a quantum gravity theory. The 
singularities predicted by general relativity in grav- 
itational collapse and in the hot big bang origin of 
the universe are thought to be artifacts of the 
classical nature of Einstein's theory, which will be 
removed by a quantum theory of gravity. Develop- 
ing a quantum theory of gravity and a unified theory 
of all the forces and particles of nature are the two 
main goals of current work in fundamental physics. 
The problem is that general relativity and quantum 
field theory cannot simply be molded together. 
There is as yet no generally accepted (pre-)quantum 
gravity theory. 

The quest for a quantum gravity theory has a long 
and thus far not very successful history. Many 
different lines of attack have been developed, each 
having a different way of dealing with the classical 
singularities that arise from point particles and 
smooth spacetime geometry. String theory does 
away with zero-dimensional point particles, and 
particles are modeled as different states of new 
fundamental objects, the one-dimensional strings. It 
turns out, however, that there is a price to pay — the 
number of spacetime dimensions must be greater 
than four for a consistent theory. When fermions are 
included, which leads to superstring theory, the 
required number of dimensions is ten — one time and 
nine space dimensions. 

There are in fact five distinct (1--9)-dimensional 
superstring theories. In the mid-1990s, duality 
transformations were discovered that relate these 
superstring theories to each other and to the (1+10)- 
dimensional supergravity theory. This led to the 
conjecture that all of these theories arise as different 
limits of a single theory, which has come to be 
known as M theory. It was also discovered that 
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extended objects of higher dimension than strings 
play a fundamental role in the theory. These objects 
are known as “branes” (from membranes), and the 
relation between them and strings leads to a new 
picture of how gravity and matter may be connected 
in the universe. Roughly speaking, open strings 
describe the particles of the nongravitational sector, 
and their ends are attached to branes, while closed 
strings, which describe the graviton and associated 
particles of the gravitational sector, can move freely 
in all dimensions. 

Thus, the observable universe could be a 
(1+3)-surface - a “brane,” embedded in a 
(1--3-- d)-dimensional spacetime — the “bulk,” 
with standard-model particles and fields trapped on 
the brane, while gravity is free to access the bulk. 
Brane-world models offer a phenomenological way to 
test some of the novel predictions and corrections to 
general relativity that are implied by M theory. 


Higher-Dimensional Gravity 


Brane worlds can be seen as reviving the original 
higher-dimensional ideas of Kaluza and Klein in the 
1920s, but in a new context of quantum gravity. An 
important consequence of extra dimensions is that 
the four-dimensional Planck scale M, = M4) = 
1.2 x 10? GeV is no longer the fundamental energy 
scale of gravity. The fundamental scale is instead 
Mq4,4,. This can be seen from the modification of 
the gravitational potential. For an Einstein-Hilbert 
gravitational action, 


S orsi = sa | ax d^y —(^*d)g 
2 Ld) 
x [HOR — 2A+a)| [1 


we have the higher-dimensional Einstein field 
equations, 


(4+d) Gp = HA Rap — 1(4+d) gH) 


B-—5 SAB 
E: -Aaa tga + Kaan Tas [2] 
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where x^ = (x^, y!,..., yf) and a d) is the gravita- 
tional coupling constant given by 


[3] 


The static weak field limit of the field equations 
leads to the (4+d)-dimensional Poisson equation, 
whose solution is the gravitational potential 


4 
Ra kd) 


V(r) x yl+d 


[4 


In the simplest scenario, we can assume a 
toroidal configuration for the d extra dimensions, 
with each compactified on the same length scale L. 
Then on scales r<L, the potential is (4+d)- 
dimensional, V~r **4, By contrast, on scales 
large relative to L, where the extra dimensions do 
not contribute to variations in the potential, V behaves 
like a four-dimensional potential, V ~ Lr !. This 
means that the usual Planck scale becomes an effective 
coupling constant, describing gravity on scales much 
larger than the extra dimensions, and related to the 
fundamental scale via the volume of the extra 
dimensions: 


2 2+d rd 


Large Extra Dimensions 


If the extra-dimensional volume is significantly 
above the Planck scale, then the true fundamental 


scale Mi4,4; can be much less than the effective scale 
M,, 


L^ > M," => Muaj < Mp [6] 


In this case, we understand the weakness of gravity 
as due to the fact that it "spreads" into extra 
dimensions, and only a part of it is felt in four 
dimensions. 

A lower limit on M(4,4; is given by null results in 
table-top experiments to test for deviations from 
Newton’s law in four dimensions, V x r^!. These 
experiments currently probe submillimeter scales, 
and find no detectable deviation, so that i 


L €10^! mm ~ (107 TeV)"! 
= Masa) S 1032-154)/(d- £2) TeV [7] 


Stronger bounds can be derived from null results in 
particle accelerators in some brane-world models, or 
from constraints imposed by observations of super- 
novae or of light-element abundance. 

Brane worlds, arising in the framework of string 
theory, thus incorporate the possibility that the 


fundamental scale is much less than the Planck 
scale felt in four dimensions. This emerges by virtue 
of the large size of the extra dimensions. It is not 
necessary for all extra dimensions to be of equal size 
for this mechanism to operate. There are string 
theory solutions (Horava-Witten solutions) with 
two (1--9)-branes located at the boundaries of the 
bulk, at the endpoints of an $! /Z; orbifold, that is, 
a circle folded on itself across a diameter. The 
orbifold extra dimension is the large one, whereas 
the other six extra dimensions on the branes are 
compactified on a very small scale, close to the 
fundamental scale, and their effect on the 
dynamics is felt through “moduli” fields, that is, 
five-dimensional scalar fields. 

These solutions can be thought of as effectively 
five dimensional, with an extra dimension that can 
be large relative to the fundamental scale. They 
provide the basis for the Randall-Sundrum 1 (RS1) 
phenomenological models of five-dimensional grav- 
ity. The single-brane Randall-Sundrum 2 (RS2) 
models with infinite extra dimension arise when 
the orbifold radius tends to infinity. The RS models 
are not the only phenomenological realizations of M 
theory ideas. They were preceded by the brane- 
world models of Arkani-Hamed, Dimopoulos, and 
Dvali (ADD), which put forward the idea that a 
large volume for the compact extra dimensions 
would lower the effective Planck scale Mi4,,). If 
MI4id is close to the electroweak scale, Mew, then 
this would address the long-standing "hierarchy" 
problem, that is, why there is such a large gap 
between Mew ~ 1 TeV and M, ~ 10!5 TeV. 

In the ADD models, more than one extra 
dimension is required for agreement with experi- 
ments, and there is *democracy" among the equiva- 
lent extra dimensions, which, in addition, are flat. 
By contrast, the RS models have a “preferred” extra 
dimension, with other extra dimensions treated as 
ignorable (i.e., stabilized except at energies near the 
fundamental scale). Furthermore, this extra dimen- 
sion is curved or *warped" rather than flat: the bulk 
is a portion of anti-de Sitter (AdS;) spacetime. The 
RS branes are Z2-symmetric (mirror symmetry), and 
have a tension, which serves to counter the influence 
on the brane of the negative bulk cosmological 
constant. This also means that the self-gravity of the 
branes is incorporated in the RS models. The novel 
feature of the RS models compared to previous 
higher-dimensional models is that the observable 
three dimensions are protected from the large extra 
dimension (at low energies) by curvature (warping), 
rather than straightforward compactification. 

The RS brane worlds provide phenomenological 
models that reflect at least some of the features of 


M theory, and that bring exciting new geometric 
and particle physics ideas into play. The RS2 
models also provide a framework for exploring 
holographic ideas that have emerged in M theory. 
Roughly speaking, holography suggests that 
higher-dimensional dynamics may be determined 
from a knowledge of the fields on a lower- 
dimensional boundary. The AdS/CFT correspon- 
dence is an example in which the classical 
dynamics of the higher-dimensional AdS gravita- 
tional field are equivalent to the quantum 
dynamics of a conformal field theory (CFT) on 
the boundary. 


Kaluza-Klein Modes 


The dilution of gravity via extra dimensions not 
only weakens gravity, it also broadens the range of 
graviton modes felt on the brane. The graviton is 
more than just the four-dimensional massless mode 
of four-dimensional gravity — other modes, with an 
effective mass on the brane, arise from the fact 
that the graviton is a (4+d)-dimensional massless 
particle. These extra modes on the brane are 
known as Kaluza-Klein (KK) modes of the 
graviton. 

For simplicity, consider a flat brane with one flat 
extra dimension, compactified through the identi- 
fication y — y+ 2rnL, where n= 0,1,2,... . The 
perturbative five-dimensional graviton is defined 
via 


(pag — ©) nap + hap [8] 


where nag is the five-dimensional Minkowski metric 


and hap is a small transverse traceless perturbation. Its 
amplitude can be Fourier expanded as 


b(x^,y) = >, eit b, (x^) [9] 


where 5, are the amplitudes of the KK modes, that 
is, the effective four-dimensional modes of the five- 
dimensional graviton. To see that these KK modes 
are massive from the brane viewpoint, we start from 
the five-dimensional wave equation that the massless 
five-dimensional field h satisfies (in a suitable 
gauge): 


OD, -0 + Oh+0%h=0 [10] 


It follows that the KK modes satisfy a four- 
dimensional Klein-Gordon equation with an effec- 
tive four-dimensional mass, m,: 


Oh, = mbn, My = T [11] 
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The massless mode, hp, is the usual four- 
dimensional graviton mode. But there is a tower 
of massive modes,  L,2L,..., which 
imprint the effect of the five-dimensional gravita- 
tional field on the four-dimensional brane. Com- 
pactness of the extra dimension leads to 
discreteness of the spectrum. For an infinite 
extra dimension, L — oo, the separation between 
the modes disappears and the tower forms a 
continuous spectrum. 


Randall-Sundrum Brane Worlds 


RS brane worlds do not rely on compactification to 
localize gravity at the brane, but on the curvature of 
the bulk. What prevents gravity from “leaking” into 
the extra dimension at low energies is a negative 
bulk cosmological constant, 


[12] 


where / is the curvature radius of AdS; and y is the 
corresponding energy scale. The bulk cosmological 
constant with its repulsive gravity effect acts to 
“squeeze” the gravitational potential closer to the 
brane. We can see this clearly in Gaussian normal 
coordinates x^ = (x^, y) based on the brane at y — 0, 
for which the metric takes the form 


(5) ds2 = dy? 4 eN u dx” dx" [13] 


with 5,, the Minkowski metric. The exponential 
warp factor reflects the confining role of the bulk 
cosmological constant. The Z5-symmetry about the 
brane at y= 0 is incorporated via the |y| term. In the 
bulk, this metric is a solution of the five-dimensional 
Einstein equations, 


) Gag = —As) gap [14] 


that is, “Tap =0 in eqn [2]. The brane is a flat 
Minkowski spacetime, gag(x^, 0) — 1,6" 4ó"p, with 
self-gravity in the form of brane tension. 

The two RS models are distinguished as follows: 


RS1 There are two branes in RS1, at y=0 and 
y= L, with Z5-symmetry identifications 


yey, ytLeL-y [15] 


The branes have equal and opposite tensions, +A, 
where 


3 Mi 


The positive-tension “TeV” brane has fundamental 
scale Mis; ~ 1 TeV. Because of the exponential 


370 Brane Worlds 


warping factor, the effective scale on the negative 
tension “Planck” brane at y=L is Mp. On the 
positive tension brane, 


Z. 3 —2L/é 
M? = Mist 1 =" " [17] 


So RS1 gives a new approach to the hierarchy 
problem. Because of the finite separation between 
the branes, the KK spectrum is discrete. 

RS2 In RS2, there is only one, positive- 
tension, brane. This may be thought of as arising 
from sending the negative tension brane off to 
infinity, L— oo. Then the energy scales are 
related via 

M? 
Mis — E u [18] 


On the RS2 brane, the negative Ais) is offset by 
the positive brane tension A. The fine-tuning in eqn 
[16] ensures that there is zero effective cosmological 
constant on the brane, so that the brane has the 
induced geometry of Minkowski spacetime. To see 
how gravity is localized at low energies, we consider 
the five-dimensional graviton perturbations of the 
metric: 


) gap T (S) GAB + bap 


" lV [1 9| 
bay =0 = b", = Ob! 


We split the amplitude 4 into three-dimensional 
Fourier modes, and the linearized five-dimensional 
Einstein equations lead to the wave equation (y > 0) 


e»t L ee J = p tp [20] 
Separability means we can write 
h(t, y) = $- pm(t) buy) [21] 
and the wave equation reduces to 
Om + (m^ + k^), = 0 [22] 
by, — ; p, +e/*h, — 0 [23] 
The zero-mode solution is | 
polt) = Ao, e** + Ao- ei! [24] 
holy) = Bo + Coe! [2.5] 


and the massive KK mode (m > 0) solutions are 


pmlt) = Am+ exp (+i Vm? + k? t) 


tA, exp(-i m? + kt) [26] 


baly) = By Jo (mie"/^) 
+ Cm Ya (mie 27] 


where J2, Y2 are Bessel functions. 
The boundary condition for the perturbations is 
h'(t,0) =0, which implies 


ce TR Jı (m£) B [28] 


Co > 0, Yi (me) "m 


In the RS1 model, we have a further boundary 
condition, b'(t,L) 20, which leads to a discrete 
eigenspectrum, namely the masses m that satisfy 


ra (mte) Yi (ml) — Y, (mee) h(mt)-0 [29] 


The zero mode is normalizable, since 


| Boe?" dy 
0 


< oo [30] 


Its contribution to the gravitational potential 
V = (1/2)boo gives the four-dimensional result, V œ 
1 1. The contribution of the massive KK modes sums 
to a correction of the four-dimensional potential. 
For r «& £, one obtains 


GM/, À GM 
v) s S (145) a = [31] 


which simply reflects the fact that the potential 
becomes truly five dimensional on small scales. For 


de ad 
M 22" 
V(r) z 一 (1 4 2 [32] 


which gives the small correction to four-dimensional 
gravity at low energies from extra-dimensional effects. 


Cosmological Brane Worlds 


The RS models contain vacuum (Minkowski) 
branes. In order to pursue brane-world ideas in 
cosmology, we need to generalize the RS models to 
incorporate cosmological branes with matter and 
radiation on them. The effective field equations on 
the brane are the vehicle for brane-bound observers 
to interpret cosmological dynamics. They arise from 
projecting the five-dimensional field equations onto 
the brane, via the Gauss-Codazzi equations. These 
equations involve also the extrinsic curvature K,,,, of 
the brane, which determines how the brane is 
imbedded in the bulk. 

The stress-energy on the brane (tension, matter, 
radiation) means that there is a jump in K,,,, across 


the brane. More precisely, the junction conditions 
across the brane are 


gs MN Eu = 0 [33] 


KK eg" IT go] A 


m» jiv 
where 


rane = Tus L TM [3 5] 


m» 


is the total energy-momentum tensor on the brane 
and Jia — ger. The Z2-symmetry means that 
when approaching the brane from one side and 
going through it, one emerges into a bulk that looks 
the same, but with the normal reversed. This implies 
that 
— _ + 
Ku DE 一 人 [36] 
so that we can use the junction condition (eqn [34]) 
to determine the extrinsic curvature: 
EMEN 1 
Kw = —3r(s) [Tw + (^ — T) 8 [37] 


where T — T",, we have dropped the (+) and we 
evaluate quantities on the brane by taking the limit 
y +0. 

Together with the Gauss-Codazzi equations, eqn [37] 
leads to the induced field equations on the brane: 


2 
Gu - 一 人 gw K Tay + 6 Sw 32d Euv [38] 
where 
EE KTA) = Lus) [39] 
A = Aw) = bAs) + ^A] [40] 
S py = b TT, a ae fU. 
+348 [3Ts4T^^ — T*] [41] 
and 
Eu 一 O CA cgpn n? g, gy” [42] 
where z^ is the unit normal to the brane and 


(9) CAcpp is the Weyl tensor in the bulk. 

The induced field equations [38] show two key 
modifications to the standard four-dimensional Einstein 
field equations arising from extra-dimensional effects. 


e S, ^v ui^ is the high-energy correction term, 
which is negligible for p< A, but dominant for 
p > À (where p is the energy density): 


|K Sv / Al 7 H ul p 


IKT 入 A d 
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e Ci the projection of the bulk Weyl tensor on the 
brane, encodes corrections from KK or five- 
dimensional graviton effects. From the brane- 
observer viewpoint, the energy-momentum 
corrections in S,, are local, whereas the KK 
corrections in Ep are nonlocal, since they 
incorporate — five-dimensional gravity wave 
modes. These nonlocal corrections cannot be 
determined purely from data on the brane. In 
the perturbative analysis of RS2 which leads to 
the corrections in the gravitational potential, eqn 
[32], the KK modes that generate this correction 
are responsible for a nonzero £,,; this term is 
what carries the modification to the weak-field 
field equations. 


The effective field equations are not a closed system. 
One needs to supplement them by five-dimensional 
equations governing €,,,, which are obtained from the 
five-dimensional Einstein equations. 


Cosmological Dynamics 


A (1+4)-dimensional spacetime with spatial 
4-isotropy (four-dimensional spherical/ plane/ 
hyperbolic symmetry) has a natural splitting into 
hypersurfaces of symmetry, which are (1 十 3)- 
dimensional surfaces with 3-isotropy and 
3-homogeneity, that is, Friedmann-Robertson- 
Walker (FRW) surfaces. In particular, the AdS; 
bulk of the RS2 brane world, which admits a 
foliation into Minkowski surfaces, also admits an 
FRW foliation since it is 4-isotropic. The general- 
ization of AdS; that preserves 4-isotropy and 
solves the five-dimensional Einstein equation is 
Schwarzschild AdS;, and this bulk therefore 
admits an FRW foliation. It follows that an 
FRW cosmological brane world can be embedded 
in Schwarzschild AdS; spacetime. 

The black hole in the bulk is felt on the brane 
via the E» term. The bulk black hole gives rise to 
*dark radiation" on the brane via its Coulomb 
effect. The FRW brane can be thought of as 
moving radially along the fifth dimension, with the 
junction conditions determining the velocity via 
the Friedmann equation. Thus, one can interpret 
the expansion of the universe as motion of the 
brane through the static bulk. In the special case 
of no black hole and no brane motion, the brane is 
empty and has Minkowski geometry, that is, the 
original RS2 brane world is recovered, in different 
coordinates. 

An intriguing aspect of the cosmological metric is 
that five-dimensional gravitational wave signals can 
take “shortcuts” through the bulk in traveling 
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between points A and B on the brane. The travel 
time for such a graviton signal is less than the time 
taken for a photon signal (which is stuck to the 
brane) from A to B. 

Cosmological dynamics on the brane are governed 
by the modified Friedmann equation: 


2 
MON px, m i. K 
H= ati) +445 = m 


where H =å/a is the Hubble expansion rate, a(t) is 
the scale factor, K is the curvature index, and m is 
the mass of the bulk black hole. 

The p?/A term is the high-energy term. When p > 
A, in the early universe, then H? x p?. This means 
that a given energy density produces a greater rate of 
expansion that it would in standard four-dimen- 
sional gravity. As a consequence, inflation in the 
early universe is modified in interesting ways, some 
of which may leave a signature in cosmological 
observations. 

The m/a* term in eqn [44] is the “dark 
radiation," so called because it redshifts with 
expansion like ordinary radiation. But, unlike 
ordinary radiation, it is not a form of detectable 
matter, but the imprint on the brane of the 
gravitational field in the bulk (the Coulomb effect 
of the bulk black hole). This additional effective 
relativistic degree of freedom is constrained by 
nucleosynthesis in the early universe. Any extra 
radiative energy not thermally coupled to radiation 
affects the rate of production of light elements, and 
observed abundances place tight constraints on 
such extra energy. The dark radiation can be no 
more than ~3% of the radiation energy density at 
nucleosynthesis: 


3m 


K? Pnuc 


< 0.03 [45] 


The other modification to the Hubble rate is via 
the high-energy correction p/A. In order to recover 
the observational successes of general relativity, the 
high-energy regime where significant deviations 
occur must take place before nucleosynthesis, that 
is, cosmological observations impose the lower 
limit 


入 > (1MeV) > M's) >104GeV — [46] 


This is much weaker than the limit imposed by 
table-top experiments, which limit the curvature 
radius to / € 0.2 mm, leading to 


入 > (100GeV) > Mis) > 108GeV — [47] 


The high-energy regime during radiation domina- 
tion is short-lived. Since p”/ decays as a^? during the 
radiation era, it will rapidly drop below one, and the 
universe will enter the low-energy four-dimensional 
regime. However, traces of the high-energy era may be 
left in the perturbation spectra that leave an imprint in 
the cosmic microwave background radiation. 

In conclusion, simple brane-world models of RS2 
type provide a rich phenomenology for exploring 
some of the ideas that are emerging from M theory. 
The higher-dimensional degrees of freedom for the 
gravitational field, and the confinement of standard 
model fields to the visible brane, lead to a complex 
but fascinating interplay between gravity, particle 
physics, and geometry, which enlarges and enriches 
general relativity in the direction of a quantum 
gravity theory. High-precision astronomical data 
mean that cosmology is a potential laboratory for 
testing and constraining these brane worlds. The 
models predict extra-dimensional signatures in the 
cosmic microwave background and other observa- 
tions, and these predictions can in principle be tested 
against data. 


See also: String Theory: Phenomenology; Supergravity; 
Superstring Theories. 
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Introduction 


In classical general relativity, a black hole is a 
solution of Einstein's equations with a region of 
spacetime which is causally disconnected from the 
asymptotic region at infinity. The boundary of such 
a region is called the *event horizon." The spacetime 
around the simplest black hole in three space 
dimensions is described by the Schwarzschild metric 


where G is Newton's gravitational constant, c is the 
velocity of light, and we have used spherical 
coordinates with dQ the line element on an S*. A 
nonrotating, uncharged star which is too massive to 
form a neutron star will eventually collapse, and at 
late times the metric will be given by [1]. The 
horizon is a null surface S? x t and the radius of the 
S^ is thorizon =2GM/c*. The Schwarzschild solution 
has generalizations to black holes with charge and 
angular momentum and no-hair theorems guarantee 
that a black hole has no other characteristic property. 
All these solutions can be generalized to other 
theories like supergravity in various dimensions. 

In 1974, Hawking showed that due to pair 
production of particles near the horizon, black 
holes radiate thermally. Hawking's calculation is 
valid for black holes whose masses are much larger 
than the Planck mass: for such black holes, the 
curvature at the horizon is weak and normal 
semiclassical quantization is valid. Remarkably, the 
properties of Hawking radiation are quite universal. 
A black hole can be characterized by an entropy 
called the Bekenstein-Hawking entropy. The leading 
result for the entropy Spy for all black holes in any 
theory with the standard Einstein-Hilbert action is 
given by 


_ AH 
SBH = 4G [2] 


where Ay denotes the area of the horizon. The 
temperature TH is given by 


TH = [3] 
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where « is the surface gravity at the horizon. The 
principle of detailed balance further ensures that the 
radiation rate of some species of particle i, T;(k), 
in some given momentum range (k, k + dk) is related 
to the corresponding absorption cross section o;(k) by 


d 

T(k) = AME. : E [4] 

e + 1 (27) 
where w is the energy and d denotes the number of 
spatial dimensions. The + sign refers to fermions 
(bosons), respectively. A nontrivial k dependence of 
o; signifies a departure from black-body behavior. 
Consequently, c;(k) is often called a grey-body 
factor. Equations [2] and [3] may be derived by 
combining Hawking's calculation of the radiation 
with standard thermodynamic relations. Alterna- 
tively, they follow from the leading semiclassical 
approximations of path-integral formulations of 
Euclidean gravity based on the standard Einstein 一 
Hilbert action. For an account of black-hole 
thermodynamics, see Wald (1994). 

Unlike usual thermodynamic systems, black holes 
appear to pose a deep puzzle. In usual systems, 
thermodynamics is a coarse-grained description of a 
system which is in a highly degenerate state. 
Typically, such systems are described in terms of a 
few macroscopic parameters such as the total 
energy, the total volume, the total charge. For each 
set of values of these macroscopic parameters, there 
are a large number of microscopic states which can 
be described in terms of the constituents such as 
atoms or molecules. This degeneracy manifests itself 
as an entropy S which is related to the number of 
microscopic states for a given set of values of the 
macroscopic parameters, Q by Boltzmann’s relation 


S = log(®) [5] 


where units have been chosen such that the 
Boltzmann constant is unity. For a black hole, the 
macrostates are specified by its mass, charge, and 
angular momentum. No-hair theorems, however, 
seem to suggest that there are no other properties 
and hence no obvious candidate for microstates. In 
the absence of such a statistical basis, one would be 
inevitably led to the conclusion that there is loss of 
information in processes involving black holes. 

In a consistent quantum theory of gravity, there 
would be such a statistical basis since quantum 
mechanics is unitary. String theory is a strong 
candidate for a unified theory which contains 
gravity. Indeed, string theory provides a microscopic 
description for a class of black holes. 
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Black Hole Solutions in String Theory 


Perturbatively, the basic excitations of string theory 
are fundamental closed and open strings character- 
ized by a string tension Ts and hence a length scale, 
the string length /, = 1/4/27T;. Consistency requires 
that the string should be able to propagate in ten 
spacetime dimensions and should be supersym- 
metric at the fundamental level. Formulated in 
this fashion, there are several consistent string 
theories: type IIA, type IIB, and heterotic string 
theory (which contain only closed strings perturba- 
tively) and type I theory (which contains both open 
and closed strings). 

At energies much smaller than 1//,, only the 
massless modes of the string can be excited. For all 
these string theories, the massless spectrum of closed 
strings contains the graviton and the low-energy 
dynamics is given by the appropriate supersymmetric 
generalization of general relativity, supergravity. In 
addition, the closed-string spectrum contains a 
neutral scalar field, the dilaton ó, whose expectation 
value gives rise to a dimensionless parameter govern- 
ing interactions, called the string coupling g.: 


pc G 


The ten-dimensional gravitational constant is given 


by 
Gio = 825g? [7] 


Ten-dimensional supergravity has a wide variety of 
black hole solutions, the simplest of which is the 
straightforward generalization of the Schwarzschild 
solution. 


Black p-Brane Solutions 


More significantly, there are solutions which are 
charged with respect to the various gauge fields that 
appear in the supergravity spectrum. Generically, 
these charged solutions represent extended objects. 
For accounts of such solutions, see Maldacena 
(1996). 

Consider, for example, the supergravity which 
follows from type IIB string theory. This theory has 
a pair of 2-form gauge fields Byn and B'yy and a 
4-form gauge field Aywpo with a self-dual field 
strength. Just as an ordinary point electric charge 
produces a 1-form gauge field, a (p + 1)-form gauge 
field may be sourced by an electrically charged 
p-dimensional extended object. The corresponding 
field strength is a (p + 2)-form, whose Hodge dual in 
d spacetime dimensions is a (d — p — 2) form. This 
shows that there should be magnetically charged 


(d —p — 4)-dimensional extended objects as well. 
These extended objects are called *branes." 

In the type IIB example, there should be two 
kinds of one-dimensional extended objects 
which carry electric charge under Byn, B^, 
called the F-string and the D-string, respectively. 
There are also two kinds of five-dimensional 
branes which carry magnetic charges under 
Bun; Bumn, called the NS 5-brane and DS brane, 
respectively. Finally, there should be a 3-brane, 
since the corresponding 5-form field strength is 
self-dual as well as a D7 brane. A similar catalog 
can be prepared for other string theories, as well 
as for 11-dimensional supergravity, which is the 
low-energy limit of M-theory. 

The classical solutions for a set of p-branes of the 
same kind generally have inner and outer horizons 
which have the topology t x S®-? x R?. The outer 
horizon is then associated with a Hawking tempera- 
ture and a Bekenstein-Hawking entropy. Of parti- 
cular interest are extremal limits. In this limit, the 
inner and outer horizons coincide and the mass 
density is simply proportional to the charge. Given 
some charge, the extremal solution has the lowest 
energy. Extremal limits are interesting because in 
supergravity these correspond to solutions in which 
some of the supersymmetries (in this case, half of the 
supersymmetries) are retained — such solutions are 
called Bogomolny-Prasad-Sommerfeld (BPS) satu- 
rated solutions. The charge in question appears as a 
central charge in the extended supersymmetry 
algebra. This fact may be used to show that such 
BPS solutions are absolutely stable. Indeed, for the 
particular solution considered here, the Hawking 
temperature TH — 0, so that there is no Hawking 
radiation, as required by stability. Furthermore, the 
entropy Spy — 0. The horizon shrinks to a point 
which appears as a naked null singularity. 

All the ten dimensions of string theory need not be 
noncompact. In fact, to describe the real world, one 
must have a solution of string theory in which six of 
the dimensions are wrapped up and form a compact 
space. In principle, however, one can compactify 
any number of dimensions. In the above example 
of a p-brane, it is trivial to compactify the 
directions along which the brane is extended to a 
p-dimensional torus, T^, which can be chosen to be 
a product of p circles each of radius R. At length 
scales much smaller than R, the theory then becomes 
a (10 — p)-dimensional theory. The p-brane appears 
as a black hole with a spherical horizon and, 
since the original p-form gauge field now behaves 
as an ordinary 1-form gauge field with a nonzero 
time component, this is an electrically charged 


black hole. 


D1-D5-N System and Five-Dimensional Black 
Holes 


For reasons which will become clear in the next 
section, it is useful to get extremal black holes with 
large horizon areas, so that Hawking's semiclassical 
formulas are valid. It turns out that such solutions 
involve branes of various types which intersect each 
other and are suitably wrapped on compact internal 
spaces. Such black holes then have necessarily 
different kinds of charges. It turns out that the 
simplest case is a five-dimensional black hole with 
three kinds of charges, which is obtained by brane 
systems wrapped on a compact five-dimensional 
space. An example is a type IIB solution which has 
D5 branes which are wrapped on either T* x S! or 
K3 x S!, together with D1 branes wrapped on the S$! 
as well as some momentum along the $!. From the 
noncompact five-dimensional point of view, this is a 
black hole with three kinds of gauge charges: the D5 
charge Os, the D1 charge O;, and a Kaluza-Klein 
charge N coming from the momentum P — N/R 
along the circle of radius R. 

When the internal space is T* x S! the five- 
dimensional Einstein frame metric is given by 


ds? = —[f(r)] ^^ (1 - 2) dr? 


dr? 
Mere penam m 
where 
p ME. h? y JL h? 
fr) = ( ju M — 3 ( QE — ) 
rè sinh? o 
x ( 十 d [9] 
and the three charges are 
Di = Vr? sinh 2a; bes re sinh 2a5 
32745 ° | 2g,l2 
l a0) 
_ VR? a. 
N = nt Bg rj sinh 2a 


where V is the volume of the T* and R is the radius 
of the circle $!. 
The ADM mass of the black hole is 


RVr¢ 
MADM = 35 4 92[8 
x (cosh 2a; + cosh2as5+cosh2a] [11] 
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The Bekenstein-Hawking entropy is given by 


3 
SpH = gps; coh a, cosh o5 cosh o [12] 


while the Hawking temperature is 


1 


TH = l 13 
" 27ro cosh a; cosh as cosh o 13) 

The extremal limit of this solution is given by 
ro —0, 01,05,0 — oo 114] 


O;, Os, N = fixed 


The extremal solution is a BPS saturated state and 
retains four of the original supersymmetries. In this 
limit, the inner and outer horizons coincide. How- 
ever, the horizon is now a smooth $? with a finite 
area in the Einstein frame metric. Consequently, the 
extremal Bekenstein-Hawking entropy is also finite 
and may be seen to be 


S owe extremal ^ 2n / Q105N [15] 
The temperature, however, is zero in this limit, 
which is consistent with the stability of a BPS 
saturated state. 

The above five-dimensional black hole is in fact a 
generalization of the Reissner-Nordtsrom black 
hole. Similar solutions with large horizon areas in 
the extremal limit can be constructed in four 
dimensions. One such construction is in the IIB 
theory wrapped on Te in which there are four sets of 
D3 branes which wrap four different T?'s contained 
in the T. Black holes with lower supersymmetry 
may be obtained by replacing the T by a Calabi- 
Yau space. 


Duality and Branes 


String theory has a rich set of symmetries called 
duality symmetries which relate different kinds of 
string theories that are suitably compactified. 
These symmetries relate different classical solutions. 
For example, application of these symmetries relate 
the five-dimensional black holes above with other 
five-dimensional black holes with different kinds of 
charges. Furthermore, at the level of supergravity, 
these various theories may be derived from 
a yet unknown 11-dimensional theory called the 
M-theory whose low-energy limit is 11-dimensional 
supergravity. 


Branes in String Theory 


For a given string theory, the perturbative spectrum 
consists of strings. However, at the nonperturbative 
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level, there are, in addition, extended objects of 
other dimensionalities. Duality symmetries imply 
that these extended objects are as “fundamental” 
as the strings themselves. Such extended objects are 
also called branes. For an exhaustive account of 
branes in string theory, see Johnson (2003). 

Like their counterparts in supergravity, branes in 
string theory are typically charged with respect to 
some gauge fields. While supergravity solutions are 
possible with any value of the charge, in string 
theory the brane charges have to be quantized. 
Multiple units of the minimum quantum of charge 
can appear as collections of branes each with unit 
charge or, alternatively, branes which wrap around 
compact cycles in space a multiple number of times. 


D-Branes 


The extended objects in string theory are described 
in terms of their collective excitations. These 
are best understood for the class of branes called 
D-branes in the type II theory, discovered by 
Polchinski. These are D1, D3,: D5, and D7 branes 
in type IIB and DO, D2, D4, and D6 branes in 
type IIA theory. Dp branes are characterized by the 
fact that they couple to, and act as sources for, 
(p+ 1)-form gauge fields which belong to the 
Ramond-Ramond sector of the theory. Collective 
excitations of a p-dimensional extended object in 
field theory are expected to be described by waves 
on its (p--1)-dimensional world volume. The 
collective coordinate action would be a quantum 
field theory which has vectors, corresponding 
to longitudinal oscillations of the brane, and 
scalars which correspond to transverse oscillations. 
For D-branes in string theory, the theory of 
collective excitations is a string field theory of open 
strings whose endpoints lie on the brane. (This is the 
origin of the nomenclature D-brane: an open string 
whose ends are constrained to lie on the brane has a 
world-sheet description in which the bosonic 
fields corresponding to transverse target space 
coordinates have Dirichlet boundary conditions.) 
The lowest-energy states of open superstrings are 
ordinary massless gauge fields and their supersym- 
metric partners so that the low-energy limit of 
the string field theory is a supersymmetric gauge 
theory. 

The fact that the underlying theory is a string 
theory has an important consequence. For a system 
of N-parallel D-branes of the same type, one 
would have open strings which join different branes 
as well as the same brane. The low-energy 
theory then becomes a supersymmetric nonabelian 
gauge theory with gauge group U(N). In a suitable 


gauge, the off-diagonal gauge fields and their super- 
symmetric partners (which include scalar fields in 
the adjoint representation) are the low-energy 
degrees of freedom of open strings which connect 
different branes. 
The mass density or tension Tp of a single Dp 
brane is given by 
1 
una - 
This couples to the (p 4- 1)-form gauge field with a 
charge 


Hp = gs Tp [17] 


and the Yang-Mills coupling constant for the collec- 
tive theory on the brane world volume is given by 


SYM_Dp = (27)? gle- [18] 


The ground state of a single Dp brane is a BPS state 
which preserves 16 of the 32 supersymmetries of the 
original theory. One consequence of this is that two or 
more parallel Dp branes of the same type form a 
threshold bound state preserving the same supersym- 
metries, with no net force between them. As a result, the 
tension of N parallel Dp branes is simply NT». 

Branes of different dimensionalities can also form 
bound states. Of particular interest are configura- 
tions which can form threshold bound states which 
preserve some supersymmetries. For example, a set 
of Ni parallel Dp branes can form a threshold 
bound state with a set of N parallel D(4 + p) 
branes with all the p branes lying entirely along the 
(4+ p)-branes. This configuration is also a BPS 
saturated state preserving eight of the original 
supersymmetries and would have charges under 
both (p + 1)-form and (p + 5)-form gauge poten- 
tials. The BPS nature ensures that the total mass 
density is the sum of the individual mass densities. 


NS Branes 


The other extended objects in string theory are 
called NS branes since they couple to p-form 
gauge fields which arise from the Neveu-Schwarz/ 
Neveu-Schwarz sector of the world-sheet theory. 
These are present in all the five string theories and 
appear in two types. The first is a macroscopic 
fundamental string which may be wound around a 
compact direction. The second is called a solitonic 
5-brane. While the collective dynamics of a funda- 
mental string is the standard world-sheet description 
of string theory, the description for the NS 5-brane 
is rather complicated and not known in full 
detail. The rest of this article deals exclusively with 
D-branes. 


D-Branes and Black Branes 


The idea that black holes correspond to highly 
degenerate states in string theory is quite old and 
dates back to 't Hooft (1990) and Susskind (1993). 
In the following two sections we discuss such black 
holes which are described by D-branes. For reviews 
see Maldacena (1996), Das and Mathur (2001), and 
David et al. (2002). 

We have so far discussed the string-theoretic 
branes in two different ways. In the first description, 
branes are solutions of the low-energy equations of 
motion — this is the setting in which branes provide 
conventional descriptions of black holes. In the 
second description, branes are certain states in the 
quantum theory of superstrings. More specifically, 
D-branes are described in terms of states of the 
open-string field theory which lives on the branes. 
The first description is necessarily approximate. On 
the other hand, the second description is exact in 
principle, although in practice one might not know 
how to write down and analyze the string-field 
theory in an exact fashion. 

The description in terms of open-string field 
theory should reduce to the description in terms of 
a classical solution when the charges and masses 
become large. If black-hole thermodynamics has a 
microscopic origin, D-branes should be highly 
degenerate states in this limit and the entropy 
should be given by the Boltzmann formula. Further- 
more, Hawking radiation should be understood as 
an ordinary decay process. 

For a system of QO, parallel Dp branes, the mass 
is O,/g,, while Newton's gravitational constant 
G ~ gê. Gravitational effects are controlled by 
GM ~ g,O,. A semiclassical limit in closed-string 
theory requires g, — 0, while a nontrivial gravita- 
tional effect in this limit requires g, O; finite, which 
implies one must have Qp >1. Furthermore, when 
g;O, > 1 the typical curvatures are small compared 
to the string scale and the semiclassical string theory 
reduces to classical supergravity. This is the limit in 
which branes are well described as classical 
solutions. 

Similar considerations apply for brane systems with 
multiple charges. For example, in the D1-D5-N 
system the classical solution becomes a good 
description when all the quantities g,O;, g,Os, and 
g-N become large. (The relevant quantity which 
comes with the momentum has g? rather than g, 
because the mass contribution from the momentum is 
simply N/R without any inverse power of g..) 
However, g, is the square of the coupling constant 
of the open-string theory living on the brane - in fact, 
eqn [18] shows this relation in the low-energy limit. 
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It is well known that in a U(O;) gauge theory the real 
coupling constant is gym y Op ^ 4/g;O,. This means 
that the semiclassical limit corresponds to a strongly 
coupled string-field theory which reduces to strongly 
coupled gauge theory in the low-energy limit and the 
picture of D-branes as a collection of open strings is 
not very useful. In fact, known calculational methods 
in gauge theory or open-string theory are not valid in 
this regime. 


Microscopic Entropy for Two-Charge Systems 


The prospects are much better for extremal black 
holes, which appear as BPS states in string theory. 
This is because the spectra of BPS states do not 
depend on the coupling. The degeneracy of such 
states may therefore be calculated at weak coupling, 
where techniques are well known and the result can 
be extrapolated to strong coupling without change. 

The simplest BPS state is the ground state of a set of 
parallel D-branes of the same type. This state is indeed 
128-fold degenerate, which would imply a micro- 
scopic entropy. This entropy, however, is small and 
therefore invisible in the corresponding classical 
solution. Indeed, the classical solution shows that in 
the extremal limit the horizon area is zero, leading to a 
vanishing Bekenstein-Hawking entropy. 

The next interesting class of states consists of 
threshold bound states with two kinds of 
charges. Consider, for example, the D1-D5 system 
on T^ x S! considered above with no momentum 
along the D1’s. By known duality transformations, 
this is equivalent to a fundamental IIB string which 
is wound Os times around the S! and with a net 
momentum P = O;/27Os5R (where R is the radius of 
the S'), with four of the transverse directions 
compactified on a T^. For this system, it is easy to 
count the number of states for given values of Qj 
and O; at weak string coupling by simply enumer- 
ating the perturbative oscillator states of the string. 
For large values of QO; and Os, we can alternatively 
calculate this entropy by using a canonical ensemble 
of eight massless bosons corresponding to the eight 
transverse polarizations and their supersymmetric 
partners — eight massless fermions — moving on the 
string with some temperature T and a chemical 
potential a for the total momentum. 

Consider a noninteracting gas of f massless bosons 
and f fermions living on a circle with circumference 
L. The average number of left- and right-moving 
particles with some energy e, denoted by py, pr, 
respectively, are 


1 


pile) = et + 1 1 二 L; R [19] 
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where the + sign refers to fermions and bosons, 
respectively, and we have introduced left- and right- 
moving temperatures Tj, Tg. The physical tempera- 
ture is 


| Lf£3 1 

===(=—+— 20 

T. 2 ( 云 + 云 ) 
The extensive quantities, such as the energy E, 


momentum P, and entropy S, then become the sum 
of left- and right-moving pieces: 


E =E +Egr, P=PL+Pr, S=S.+ Spr [21] 


and the distribution function [19] leads to the 
following thermodynamic relations: 


TES 
Lrf  «fL' 


Since the total momentum P = Pg + PL = Eg — Ey is 
nonzero, the lowest-energy state is clearly the one in 
which all the particles move in the same direction, 
for example, right moving. This is a BPS state and 
corresponds to the extremal solution in supergravity. 
Then E= Eg =P — Pg. This approach to the black 
hole entropy was initiated by Das and Mathur 
(1996) and Callan and Maldacena (1996). 

For our two-charge system, f =8,P=27Q,/L, 
and L=27RQ;Qs. Using [22] we get 


Scharel L Yr / 20105 [23] 


This is the microscopic entropy for the fundamental 
string with momentum in the type II theory. By 
duality, this is also the microscopic entropy of the 
D1-D5 system. This is a large number which should 
agree with the macroscopic entropy calculated from 
the corresponding classical solution. 

The discussion is almost identical for the funda- 
mental heterotic string, except that now we have 
24 right-moving bosons, eight left-moving bosons, 
and eight left-moving fermions, and the BPS state 
consists only of right movers. If nw denotes the 
winding number and z, the quantized momentum 
the extremal heterotic string entropy is 


T; = 


i=L,R [22] 


gros heterotic __ An pM: (2 4] 
The supergravity solution for the D1-D5 
system may be obtained by substituting o=0 in 
eqns [8]-[13]. In the extremal limit, the classical 
Bekenstein-Hawking entropy vanishes as is clear 
from the expression [15], in which N=0. This 
appears to be in contradiction with the fact that the 
state has a large microscopic entropy. 


The key point, however, is that the two-charge 
solution has a singular horizon where the string 
frame curvature is large. Consequently, low-energy 
tree-level supergravity breaks down near the horizon 
and higher-derivative terms (e.g., higher powers of 
curvature) become important. This issue has been 
best studied for the fundamental heterotic string 
compactified on T°. This is dual to the D1-D5 
system in type IIB theory compactified on K3 x T?. 
The classical supergravity solution is then a singular 
black hole in four spacetime dimensions. In one of 
the first papers on the string-theoretic understanding 
of black hole thermodynamics, Sen (1995) showed 
that, for large np, mw, string-loop effects are small 
near the horizon so that the only relevant correc- 
tions are higher-derivative terms coming from 
integrating out the massive modes of the string at 
tree level. Furthermore, a robust scaling argument 
shows that regardless of the detailed nature of the 
derivative corrections, the macroscopic entropy 
defined through the horizon area must be of the 
form a,./nymy, where a is a pure number. Finally, 
one can define a "stretched horizon" as the surface 
where the curvature becomes of the order of the 
string scale and the area of the stretched horizon 
is indeed proportional to V7zp7zw. This result gives 
a strong indication that string theory provides a 
microscopic basis for black hole thermodynamics, 
although the coefficient a cannot be determined 
without more detailed knowledge of higher- 
derivative terms. 


Microscopic Entropy of Extremal Three-Charge 
System 


Brane bound states with three kinds of charge 
provide examples of black holes whose extremal 
limits have large horizons with curvatures much 
smaller than the string scale. In this case, a 
microscopic count of states in string theory should 
exactly account for the  Bekenstein-Hawking 
formula, without corrections coming from 
higher derivatives. This is indeed true, as first found 
by Strominger and Vafa (1996). In the following, we 
will outline how this calculation can be done in the 
D1-D5-N system on K3 x S! or T* x S! following 
the treatment of Dijkgraaf et al. (1996). 

D1 branes can be considered as “instanton 
strings" in the six-dimensional supersymmetric 
U(Os) gauge theory of DS branes (actually, these 
should be called solitonic strings rather than 
instantons, since the configurations are time 
independent). The total instanton number is the 
Di-brane charge QO;. The moduli space of 
these instantons is then a blown-up version of the 


orbifold (T*)9'9: /S(Q1Os) or (K3)9'95/S(O10;) 
and is 40105 dimensional. Since any instanton 
configuration is independent of time x? and the S! 
direction x?^, the collective coordinate dynamics is a 
(1 + 1)-dimensional field theory which lives in the 
(x?, x?) space. At low energies, this flows to a 
conformal field theory with a central charge 
c—6Q10; since there are 40105 bosons each 
contributing 1 to the central charge and an equal 
number of fermions each contributing 1/2. The BPS 
state with momentum N/R is a purely right- or left- 
moving state in this conformal field theory which 
has a conformal weight N. From general principles 
of conformal invariance, the degeneracy of such 
states for large N is given by Cardy's formula 


e e?" v cN/6 [25] 


so that the microscopic entropy is 


ieo se = logd(n) = 2r /cN/6 [26] 


Substituting the value of c= 60105, this is in exact 
agreement with the Bekenstein-Hawking entropy of 
the classical solution given in [15]. 


d(N) 


Nonextremal Black Holes and Hawking 
Radiation 


The BPS property of ground states of D-brane 
systems enables us to compute the degeneracy of 
microstates exactly in the regime of parameters 
where the state can be reliably described as a black 
hole solution in the low-energy theory. However, 
extremal black holes have vanishing temperature 
and do not radiate. To understand the microscopic 
origins of Hawking radiation, one has to go away 
from extremality. Such states are not supersym- 
metric and an extrapolation of weak-coupling 
calculations to strong coupling is not a priori 
justified. Nevertheless, it turns out that for small 
departures from extremality, weak-coupling results 
still reproduce semiclassical answers for entropy, 
temperature, and luminosity. 


Near-Extremal Entropy 


Nonextremal properties are best understood for the 
D1-DS5-N system on T x $!. In the orbifold limit, 
the conformal field theory which describes the low- 
energy dynamics is equivalent to a gas of strings 
which are wound around the S! and which can 
oscillate along the T*. The total winding number is 
k — O10; and may be achieved by sets of strings 
which are multiply wound in various ways. As 
argued below, entropically the most favored config- 
uration is a single long string wound around O40; 
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times. Thus, the thermodynamics may be analyzed 
exactly along the lines of the fundamental string in 
the previous section. The thermodynamic relations 
are given by [22] with f —4 and L 2 2xRQ10O;. The 
extremal state consists entirely of right movers and 
E—Eg —N/R. Substituting these values in [22] 
yields the correct formula for the microscopic 
entropy 


Soarge 一 2rVOIOSN [27] 


The same expression follows if f=4Q1Q: and 
L=27rR corresponding to Q4O; singly wound 
strings. However, for statistical methods to hold, 
the entropy must be much larger than the number of 
flavors. The ratio of the entropy to the number of 
flavors is S/f ~ J/N/Q1Os for multiple singly 
wound strings and is not guaranteed to be large 
when all of O1, Os, N are large. On the other hand, 
this ratio is S/f ~ 4/O10O5;N for the long string. 
This shows that the long string is always entropi- 
cally favored. 

A departure from the extremal state is achieved by 
adding a left-moving momentum 277/L as well as a 
right-moving momentum 277z/L to the extremal 
state, thus adding energy to the system but main- 
taining the total momentum. For the long string, this 
yields 


Sr = 20 / O105N +n, 


For small departures from extremality, n < N, the 
expressions for the total entropy and temperature as 
a function of the excess energy AE=2n/Q,Os 
agree exactly with the near-extremal Bekenstein- 
Hawking entropy and the Hawking temperature of 
the classical solution, as shown by Callan and 
Maldacena (1996) and by Horowitz and 
Strominger. 

The necessity of the long string appears in another 
important physical consideration. For statistical 
mechanics to be valid, the specific heat of the system 
has to be larger than unity. This implies that for 
the case considered here the energy gap AE must be 
larger than 1/RQ41O;, which is precisely what the 
long string yields. 


SL = 2n V/n [28] 


Hawking Radiation 


A nonextremal state described above is unstable, 
since a left mover can annihilate a right mover into a 
closed-string mode which may leave the brane 
system and propagate to the asymptotic region. 
The resulting closed-string state will be in a thermal 
state whose temperature is the physical temperature 
of the initial state. This process is the microscopic 
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description of Hawking radiation. The decay rate is 
related to the absorption cross section of the 
corresponding mode by the principle of detailed 
balance, encoded in eqn [4]. 

From the point of view of the classical solution, 
the absorption cross section can be calculated by 
solving the linearized wave equation in the 
background geometry and calculating the ratio of 
the incident and reflected waves. It follows from 
these calculations that at low energies, absorption 
(and hence emission) are dominated by massless 
minimally coupled scalars. In fact, for any spheri- 
cally symmetric black hole in any number of 
dimensions, there is a general theorem which 
ensures that the low-energy limit of this absorption 
cross section is exactly equal to the horizon area. 

In the microscopic model for the three-charge 
black hole, this absorption cross section may be 
calculated by the usual rules of quantum mechanics. 
In the long-string limit and in the approximation 
that the modes on the long string form a dilute gas, 
the result has been derived by Das and Mathur 
(1996): | 


n 21G10010; 


e9/T _ 4 
o(w) > 


P (esf2Ts = 1)(e«/?T. - 1) [29] 


where V is the volume of the T* and T is the 
physical temperature given by [20]. For a near- 
extremal hole Tg > Ti, so that T~27T,. Then 
in the extreme low-energy limit w < Tg, so that 
the corresponding Bose factor may be approxi- 


mated as 1/(e%/*7®—1)~2Tp/w. The cross 
section [29] becomes 
o = ATO OsGioTR _ 4GioSR 
V (27R)V 
== "(re = Au [30] 


where G; is the five-dimensional Newton's gravita- 
tional constant. We have used the relation [22] with 
L —22RQ10; and f =4 The fact that in the near- 
extremal limit SR is simply the extremal entropy and 
the fact that the extremal entropy reproduces the 
Bekenstein-Hawking formula has been used as well. 
Thus, the microscopic cross section exactly reproduces 
the semiclassical result at low energies. Even more 
remarkably, the full cross section [29] agrees with the 
semiclassical answer for the gray-body factor for 
parameters which correspond to the dilute-gas regime, 
as shown by Maldacena and Strominger. 

It is rather surprising that the results for micro- 
scopic absorption cross section calculated at weak 
coupling agree with the semiclassical answers, since 
the relevant process involves states which are not 


supersymmetric and therefore a naive extrapolation 
to strong coupling is not a priori justified. There 
are strong indications, however, that low-energy 
nonrenormalization theorems are at work. This 
agreement has been established not only for black 
holes with finite-horizon areas, but also for other 
systems with no horizons — most significantly, a set 
of parallel 3-branes — and forms the basis for 
Maldacena's conjecture about AdS/CFT Correspon- 
dence (see AdS/CFT Correspondence). 


Effects of Higher-Derivative Terms 


The classical low-energy limit of string theory is 
supergravity. The effects of the massive modes of the 
string as well as effect of string loops is to add terms to 
the supergravity action which involve higher number 
of spacetime derivatives, for example, terms containing 
higher powers of the curvature. In the presence of such 
terms, the Bekenstein—Hawking formula for black hole 
entropy [2] receives corrections which can be calcu- 
lated in a systematic fashion. It turns out that for a 
class of extremal black holes, this corrected entropy as 
computed in the modified supergravity is also in exact 
agreement with a microscopic calculation. 

One example of this agreement is provided by four- 
dimensional extremal black holes in type IIA string 
theory compactified on a Calabi-Yau manifold. These 
are obtained by wrapping D4 branes on three different 
4-cycles on the Calabi-Yau and having in addition a 
number of DO branes. Let p^, A — 1,...,3 denote the 
three D4 charges and qo denote the DO charge. The 
microscopic entropy of the BPS state can be computed 
by embedding this in M-theory: 


SCY -Black hole 
micro 


1 
=A 6 Igol(Cancp^pPpC + cap’) [31] 


where Capc is the intersection number of the 
4-cycles and c; denotes the second Chern class of 
the Calabi-Yau space. When all the charges p^ are 
large, the term involving c» is subdominant. In this 
case, the result agrees with the Bekenstein-Hawking 
entropy of the corresponding classical solution. 
When the charges are not all large (so that the 
second term is appreciable), the curvatures of the 
supergravity solution become large at the horizon 
and higher-derivative corrections to the action 
cannot be ignored. In this particular case, it turns 
out that these higher-derivative corrections are 
string-loop corrections and can be computed using 
general properties of N —2 supersymmetry, so that 
one can compute corrections to near-horizon 
geometry. Furthermore, one has to now modify the 


expression for macroscopic entropy using the 
formalism of Wald. Putting these together, it is 
found that the macroscopic entropy following from 
the modified supergravity is in exact agreement with 
[31]. This subject is reviewed in Mohaupt (2000). 

These methods have also been applied to the 
problem of two-charge black holes in heterotic 
string theory on T6 or, equivalently, type IIA on 
K3 x T* (Dabholkar 2004). Recall that in this case 
the horizon of the usual supergravity solution is 
singular. It has been found that leading-order 
higher-derivative corrections smoothen out the 
horizon into a AdS, xS? spacetime and the 
modified expression for the macroscopic entropy is 
again in exact agreement with the microscopic 
answer [23]. 


Geometry of Microstates 


A satisfactory solution of the information-loss 
paradox requires a much more detailed understand- 
ing of black holes in string theory. The discussion 
above shows that black holes have microstates 
which may be described well in the weak-coupling 
regime. It is interesting to ask whether there is a 
description of these microstates in the strong- 
coupling regime in terms of the effective geometry 
perceived by suitable probes. This question has been 
answered for the two-charge system in great detail 
(see Mathur (2004)). It turns out that the D1-D5 
microstates can be described by perfectly smooth 
metrics with no horizons, and they asymptote to 
the standard two-charge metric discussed above. 
The location of the erstwhile stretched horizon 
marks the point where the different microstates 
start differing from each other significantly. Since 
each such geometry does not have a horizon, neither 
does it have any entropy - this is consistent with 
their identification with nondegenerate microstates. 
Indeed, the number of such microstates correctly 
accounts for the microscopic entropy. Whether a 
similar picture holds for the three-charge system 
remains to be seen in detail, although there are some 
indications that this may be true. In this approach, it 
is not yet fully understood how a horizon emerges 
and why the entropy scales as the horizon area. 


Outlook 


One key feature of the understanding of black hole 
statistical mechanics from the dynamics of branes is 
the fact that a problem in gravity is mapped to a 
problem in a theory without gravity, for example, 
open-string field theory. In fact, the closed strings in 
the bulk are already contained in the spectrum of the 
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open strings. This is a consequence of the basic 
duality between open strings and closed strings. 
Furthermore, the open-string theory lives in a lower- 
dimensional spacetime. This is a manifestation of 
the holographic principle. As argued by Maldacena, 
the presence of a horizon implies that the low- 
energy limit retains all the modes of the closed 
strings near the horizon, while it truncates the open- 
string theory to a gauge theory. Open-closed duality 
then reduces to gauge-string duality. This provides a 
strong evidence that black holes obey the normal 
laws of quantum mechanics and hence their time 
evolution is unitary. 

One of the most outstanding problems in the 
subject is a proper understanding of neutral black 
holes. Most of the quantitative results described 
above depend on supersymmetry, which allows 
extrapolation of weak-coupling answers to the 
strong-coupling domain. Some of these results can 
be extended to situations which have small depar- 
tures from supersymmetry, for example, near- 
extremal black holes. States corrresponding to 
neutral black holes are, however, far from super- 
symmetry and known calculational techniques fail. 
There are good reasons to expect, however, that the 
general philosophy — in particular the holographic 
principle — is still valid. Finally, so far string theory 
has been able to attack problems of eternal black 
holes. A satisfactory understanding of the informa- 
tion-loss problem requires an understanding of the 
dynamics of black hole formation and subsequent 
evaporation. Unfortunately, very little is known 
about this at the moment. 


See also: AdS/CFT Correspondence; Black Hole 
Mechanics; Supergravity; Superstring Theories. 


Glossary 


ADM (Arnowitt-Deser-Misner) mass — Mass of a gravita- 
tional background which is asymptotically flat. 

AdS,, (anti-de Sitter space) — A space (or spacetime) with 
constant negative curvature in 7 dimensions. 

BPS state (Bogomolny-Prasad-Sommerfeld state) — In a 
theory of extended supersymmetry, a state that is 
invariant under a nontrivial subalgebra of the full 
supersymmetry algebra. These states always carry 
conserved charges, and supersymmetry determines the 
mass exactly in terms of the charges. 

Calabi-Yau space — Complex Kahler manifold with 
vanishing first Chern class. 

Compactify (n. compactification) — To consider a field or 
string theory in a spacetime some of whose spatial 
dimensions are compact. 

Dirichlet boundary condition - The boundary condition 
which fixes the value of a field on the boundary. 
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Duality Equivalence of systems which appear to be 
distinct. For string theories, such equivalences relate 
string theories on different spacetimes as well as 
theories with different coupling constants. 

Einstein-Hilbert action — The standard action for gravity 
which leads to Einstein's equation, 
S=(1/167G) f d?x gR, where R is the Ricci scalar, 
g denotes the determinant of the metric, and G is 
Newton's gravitational constant. 

Instanton — A classical solution of Euclidean field theory 
with finite action. 

Kaluza-Klein gauge field — In a compactified theory, the 
gauge field which arises from the metric of the higher- 
dimensional theory. 

K3 - The unique Calabi-Yau manifold in four dimensions 
having an SU(2) holonomy. 

Loop levels - In a Feynman diagram expansion of a field 
theory, terms which contribute in higher orders of the 
Planck constant P. 

Macroscopic entropy — Entropy associated with gravita- 
tional backgrounds via the Bekenstein-Hawking for- 
mula or its generalization. 

Microscopic entropy — Entropy which follows from the 
degeneracy of states of a system via Boltzmann's 
relation. i 

Minimally coupled scalar — A scalar field whose equation 
of motion is the standard Klein-Gordon equation 
where the derivatives are covariant derivatives. 

Neveu-Schwarz/Neveu-Schwarz states — In type I and H 
string theories, bosonic closed-string states whose left- 
and right-moving parts are bosonic. 

No-hair theorem — A theorem in general relativity which 
states that black holes with nonsingular horizons are 
uniquely characterized by their mass, angular 
momenta, and charges which can couple to long- 
range gauge fields. 

Orbifold — A coset space M/G where G is a group of 
discrete symmetries of a manifold M. If G has a fixed 
point, the space is singular. 

p-Form — A fully antisymmetric p-index tensor. 

Ramond-Ramond states — In type I and II string theories, 
bosonic closed-string states whose left- and right- 
moving parts are fermionic. 

Reissner-Nordstrom black hole — Black hole solution of 
general relativity with electric Maxwell charge. 

S" — n-Dimensional sphere. 


Supergravity — Supersymmetric extension of general 
relativity. 
Supersymmetry — A symmetry between bosons and 


fermions. 


Threshold bound state — A bound state which is margin- 
ally bound, that is, the binding energy is zero. 

Tree level — In a Feynman diagram expansion of a field 
theory, terms which contribute to lowest order of the 
Planck constant P. 

U(N) - The group of Nx N unitary matrices. If the 
determinant is unity, the subgroup is called SU(N). 
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Introduction 


Watching the sea or a lake it is often possible to 
trace a wave as it propagates on the water's surface. 
One can roughly distinguish two types of breaking 
waves. All waves break while reaching the shore but 
certain waves break far from the shore. In the first 
case, the change in water depth or the presence of an 
obstacle (e.g., a rock) seems to cause wave breaking, 
while for certain waves within the second category, 
these factors appear not to be essential. It is a matter 
of observation that for many waves that break in the 
open water a drastic increase in their slope near 
breaking is noticeable. This leads us to the following 
mathematical definition: the wave profile gradually 
steepens as it propagates until it develops a point 
where the slope is vertical and the wave is said to 
have broken (Whitham 1980). Throughout this 
article, we are concerned with wave breaking that 
is not caused by a drastic change of the topography 
of the bottom; for a discussion of wave breaking at 
the beach we refer to Johnson (1997). The governing 
equations for water waves (see the next section) are 
too difficult to be dealt with in their full generality. 
Therefore, to gain some insight, one has to find 
simpler models that are more tractable mathemati- 
cally. Investigating the properties of the model, 
certain predictions can be made. The conclusions 
reached will reflect reality only to some limited 
extent. The value of a model depends on the number 
and the degree of accuracy of physically useful 
deductions that can be made from it — the “truth” of 
the model is meaningless as all experiments contain 
inaccuracies and effects other than those accounted 
for (while deriving the model) cannot be totally 
excluded. We intend to discuss the way in which a 
recent model due to Camassa and Holm (1993) can 
lead to a better understanding of breaking water 
waves. Firstly we survey a few classical nonlinear 
partial differential equations that model the propa- 
gation of water waves over a flat bed (within the 
confines of the linear theory one cannot cope with 
the wave breaking phenomenon) and discuss their 
relevance to the study of breaking waves. We then 
analyze the breaking of waves within the context of 
the Camassa-Holm equation: existence of breaking 
waves, criteria that guarantee that a certain initial 
shape develops into a breaking wave, specific 
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features of wave breaking (blow-up rate and blow- 
up set for certain types of breaking waves). We 
conclude the presentation with a discussion of the 
way in which solutions to the Camassa-Holm 
equation can be continued after wave breaking. 


The Governing Equations 


The water waves that one typically sees propagating 
on the surface of the sea or on a lake are, as a matter 
of common experience, approximately two dimen- 
sional. That is, the motion is identical in any direction 
parallel to the crest line. To describe these waves, it 
suffices to consider a cross section of the flow that is 
perpendicular to the crest line. Choose Cartesian 
coordinates (x, y) with the y-axis pointing vertically 
upwards and the x-axis being the direction of wave 
propagation, while the origin lies at the mean water 
level. Let (u(t, x, y), v(t, x,y)) be the velocity field of 
the flow, let y — — d be the flat bed (for some fixed 
d > 0), and let y ^ (t, x) be the water's free surface. 
Homogeneity (constant density) is a physically reason- 
able assumption for gravity waves (Johnson 1997), 
and it implies the equation of mass conservation 


Ux + Vy —0 [1] 


The inviscid setting is realistic since experimental 
evidence confirms that the length scales associated 
with an adjustment of the velocity distribution due to 
laminar viscosity or turbulent mixing are long com- 
pared to typical wavelengths. Under the assumption of 
inviscid flow the equation of motion is Euler’s equation 
U; + UVy + vv, = —P,—g 
where P(t,x,y) denotes the pressure and g is the 
gravitational constant of acceleration. The free 
surface decouples the motion of the water from 
that of the air so that (Johnson 1997) the dynamic 
boundary condition 


P=Po ony=n(t,x) [3] 


must hold if we neglect surface tension, where Po is 
the (constant) atmospheric pressure. Moreover, 
since the same particles always form the free surface, 
we have the kinematic boundary condition 


on y = (t, x) [4] 


On the flat bed we have the kinematic boundary 
condition 


U = mh + Uny 


v=0 ony=-d [5] 
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expressing the fact that the flow is tangent to the 
horizontal bed (or, equivalently, that water cannot 
penetrate the rigid bed). The governing equations 
for water waves are [1]-[5]. Other than the fact that 
they are highly nonlinear, a main difficulty in 
analyzing the governing equations lies in the fact 
that we deal with a free boundary problem: the free 
surface y=7(t,x) is not specified a priori. In our 
discussion, we suppose that initially (at time ? = 0), a 
disturbance of the flat surface of still water was 
created and we analyze the subsequent motion of 
the water. The balance between the restoring gravity 
force and the inertia of the system governs the 
evolution of the mass of water and our primary 
objective is the behavior of the free surface. 

An important category of flows are those of zero 
vorticity, characterized by the additional assumption 


Wy = Ux [6] 


The vorticity of a flow, w= tty — vx, measures the local 
spin or rotation of a fluid element. In flows for which 
[6] holds the local whirl is completely absent and for 
this reason such flows are called irrotational. Relation 
[6] ensures the existence of a velocity potential, namely 
a function ó(t, x, y) defined up to a constant via 


dy =v 


Notice that [1] ensures that @ is a harmonic 
function, that is, (0? + &)p=0. In this way, the 
powerful methods of complex analysis become 
available for the study of irrotational flows. Thus, 
while most water flows are with vorticity, the study 
of irrotational flows can be defended mathemati- 
cally on grounds of beauty. Concerning the physical 
relevance of irrotational water flows, experimental 
evidence indicates that for waves entering a region 
of still water the assumption of irrotational flow is 
realistic (Johnson 1997). Moreover, as a conse- 
quence of Kelvin's circulation theorem (Acheson 
1990), a water flow that is irrotational initially has 
to be irrotational at all later times. It is thus 
reasonable to consider that water motions starting 
from rest will remain irrotational at later times. 


Py = U, 


Nonlinear Model Equations 


Starting from the governing equations [1|-[6] one can 
derive a variety of model equations using the non- 
dimensionalization and scaling approach: a suitable 
set of nondimensional variables is introduced, which, 
after scaling, leads to the appearance of parameters. 
The sizes and relative sizes of these parameters then 
govern the type of phenomenon that is of interest. An 
asymptotic expansion in one or several parameters 


yields an equation that is usually of significance in 
some region of space/time. The aim of this process is to 
obtain a simpler model that can be used to gain some 
understanding and to make some predictions for 
specific physical processes. This scaling method yields 
the Korteweg-de Vries (KdV) equation 


T + Me + Nx = 0, t>0,xER [7] 


as a model for the unidirectional propagation of 
shallow water waves over a flat bed (Johnson 1997). 
In [7] the function 7(t,x) represents the height of the 
water's free surface above the flat bed. We would 
like to emphasize that the *shallow water" regime 
does not refer to water of insignificant depth — it 
indicates that the typical wavelength is much larger 
than the typical depth (e.g., tidal waves are 
considered to be shallow water waves although 
they affect the motion of the deep sea). The KdV 
model admits the solitary wave solutions 


ne(t,x) = 3csech* ($ (x — 加 cER [8] 


For any fixed c > 0, the profile n- propagates without 
change of form at constant speed c on the surface on 
the water, that is, it represents a traveling wave. Since 
the profiles [8] of the traveling waves drop rapidly to 
the undisturbed water level 7 = 0 ahead and behind the 
crest of the wave, 7, are called solitary waves. Notice 
that [8] shows that taller solitary waves travel faster. 
They have other special properties: an initial profile 
consisting of two solitary waves, with the taller 
preceding the smaller one, evolves in such a way that 
the taller wave catches up the other, there is a period of 
complicated nonlinear interaction but eventually both 
solitary waves emerge completely unscathed! This 
special type of nonlinear interaction (the superposition 
principle is not valid since KdV is a nonlinear 
equation) in which solitary waves regain their form 
upon collision occurs only for special equations, in 
which case the solitary waves are called solitons. A 
further interesting property of the KdV model, relevant 
for the understanding of the interaction of solitons, is 
the fact that it is completely integrable (McKean 
1998): there is a transformation which converts the 
equation into an infinite sequence of linear ordinary 
differential equations which can be trivially integrated. 
Moreover, the KdV-solitons 7- are stable: an initial 
profile that is close to the form of a soliton will evolve 
into a wave that at any later times has a form close to 
that of a soliton (Benjamin 1972). Despite all these 
intriguing features of the KdV-model, for all initial 
profiles x — 7(0, x) within the Sobolev space H! (R) of 
square-integrable functions with a square-integrable 
distributional derivative, eqn [7] has a unique solution 


defined for all times t > 0 (cf. Kenig et al. (1996)) so 
that the KdV model cannot be used to shed light on the 
wave breaking phenomenon. 

Whitham (1980) suggested the equation 


T— [ k(x—y)y(t,y\dy=0 [9 


for the free surface profile x> 7(t,x), with the 
singular kernel 


1/2 
«ox ee 


to model wave breaking. It can be shown 
(see Constantin and Escher (1998) and references 
therein) that [9] describes wave breaking: there are 
smooth initial profiles x++7(0,x) such that the 
resulting unique solution of [9] exists on a maximal 
time interval [0, T) with 


sup {n(t,x)} < oc 
(t,x)e[0, T) xR 


inf(n(tx)) > -œ as tTT 
x€R 


(the solution remains bounded but its slope becomes 

infinite in finite time). However, in contrast to the KdV 

model, eqn [9] is not integrable and does not possess 

soliton solutions. As emphasized by Whitham (1980), 

it is intriguing to find models for water waves which 

exhibit both soliton interaction and wave breaking. 
The Camassa-Holm equation 


Tt 一 Tlixx 十 3m. = 27x Nex + xxx [10] 


was first obtained by Fokas and Fuchssteiner (1981/ 
82) as a nonlinear partial differential equation with 
infinitely many conservation laws. Camassa and Holm 
(1993) derived [10] as a model for shallow water 
waves, established that the equation possesses soliton 
solutions and found that it is formally integrable (for 
a discussion of the integrability issues we refer 
to Constantin (2001), and Lenells (2002)). Moreover, 
the solitons of [10] are stable (Constantin and Strauss 
2003). An astonishing plentitude of structures is 
tied into the Camassa-Holm equation: [10] is a re- 
expression of geodesic flow on the diffeomorphism 
group (Constantin 2000, Kouranbaeva 1999), a 
property that can be used to show that the least action 
principle holds in the sense that there is a unique flow 
transforming a wave profile into a nearby profile 
within the class of flows that minimize the kinetic 
energy (see the discussion in Constantin (2000) and 
Constantin and Kolev (2003)). Interestingly, the 
Camassa-Holm equation also models wave breaking. 
More precisely (see the discussion in Constantin 
(2000), for any initial data x 7o(x)—7(0,x) in 
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H?(R) there is a unique solution of [10] defined on 
some maximal time interval [0, T) and the solution 
stays uniformly bounded on [0, T) with 


lim ( inf {mx(¢,)}(T - t) = 
In addition to this, for a large class of initial data, there 
is precisely one point where the slope of the wave 
becomes infinite at breaking time (Constantin 2000): if 
no Æ 0 is odd and such that 7o(x) — ng(x) > 0 for all 
x € €, then the corresponding wave t [x — n(t, x)] 
will break in finite time T < oo and 


li t0)-- 
im rx (£,0) 00 


whereas 


cosh(x) 
Inx(t,x)| < K + Sabiri 


te[0,?T), x40 


for some constant K > 0. Thus, the Camassa-Holm 
model is an integrable infinite-dimensional Hamil- 
tonian system with stable solitons and eqn [10] 
admits also breaking waves as local solutions (see 
Constantin and Escher (1998) and McKean (1998) 
and references therein for further results on wave 
breaking for the Camassa-Holm equation). 

We conclude our discussion by pointing out that it 
is possible to continue solutions of the Camassa- 
Holm equation past the breaking time. For this 
purpose it is convenient to rewrite [10] as the 
nonlinear nonlocal conservation law 


1 2 
™ + Mx +38 | e *» G + 2) dy-0 [11 
R 


reminiscent to some extent to the form of [7] and [9] 
and obtained by formally applying the operator 
(1— 82) to [10] in view of the fact that 


(1-67) f 2P«f for feL*(R) 
the kernel of the convolution being 


P(x) =le, xeER 


By introducing a new set of independent and depen- 
dent variables it is possible to resolve all singularities 
due to wave breaking in the sense that [11] is 
transformed into a semilinear system, the unique 
solution of which can be obtained as a fixed point of 
a contractive operator (Bressan and Constantin 2005). 
In terms of [11], a semigroup of global conservative 
solutions (in the sense that the total energy 


1 
afo + mp, )dx 
R 
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equals a constant, for almost every time), depending 
continuously on the initial data 7(0,-) € H'(R), is 
thus constructed. 


See also: Compressible Flows: Mathematical Theory; 
Dynamical Systems in Mathematical Physics: 

An Illustration from Water Waves; Integrable Systems: 
Overview; Interfaces and Multicomponent Fluids. 
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Introduction 


The BRST symmetry was originally introduced in the 
seminal papers by Becchi et al. (1976) and Tyutin (1975) 
for Yang-Mills gauge theories as a tool for controlling 
the renormalization of the models in a consistent (gauge- 
independent) way. This symmetry was discovered as a 
residual symmetry of the gauge-fixed action. It was 
realized later that, in fact, the BRST construction is quite 
general, in the sense that it covers arbitrary gauge 
theories and not just Yang-Mills gauge models. 
Furthermore, it is intrinsic, in that no gauge choice is 
actually necessary to define it. 

The purpose of this review is to explain the general, 
intrinsic features of the BRST formalism applicable to 
“any” gauge theory. The proper setting for discussing 
these issues is that of homological algebra (Stasheff 
(1998), and references therein). This article first explains 
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the necessary algebraic material underlying the con- 
struction and then illustrates it in the cases of the 
Hamiltonian BRST formalism and the Lagrangian 
BRST formalism. 


A Result from Homological Algebra 


The main result of homological algebra needed in 
the BRST construction deals with a differential 
complex C with two gradings. The first grading is 
an N-degree and is called the “resolution degree,” or 
“r-degree.” The second grading is a Z-degree and is 
called the total ghost number. It is denoted by gh. 
We assume that there are two odd derivations 6 and 
so that have the following properties: 


r(6)=—-1, = gh(é)=1 " 
r(so) — 0, gh(so) = 1 
and 
6? —0, 5so6--6s9 — 0, s5—-—[ós] [2] 


for some derivation s; of r-degree 1 and ghost 
number 1. The bracket [:,| is the graded commu- 
tator — in this specific case, the anticommutator. We 
also assume that the homology of ó vanishes at 
nonzero value of the r-degree, both in the original 
complex C, 

Hjb6,C)20, R9 [3] 
(which is equivalent to 6a = 0, r(a) > 0 — a= ôb) 
and in the space of derivations, 


as] 20, r(oa)zZ0 — «- [08,6] [4| 


where o and 5 are both derivations in C. The 
r-degree of a homogeneous linear operator a 
is defined through r(a(x))=r(a)+r(x) for any 
element x € C and is negative when o decreases the 
r-degree. 

In Ho(6,C), the (odd) derivation so defines a 
differential. The cohomology of sg modulo 6, 
denoted H^(so, Ho(6, C)), is the cohomology of so in 
Ho(6,C). It is explicitly defined through the cocycle 
condition 


soa = óm [5] 
with coboundaries of the form 
sob 十 ón [6] 
The central result underlying the BRST construc- 
tion is: 


Theorem 1 Given the above setting, there exists 
an odd derivation s in C with the following 
properties: 


s=6+59 +1 Fs [7] 
(s)-k ^ ghs)-1 [8] 
s? [9] 


Furthermore, one has 
H*(s,C) = H* (so, Ho(6, C)) [10] 


The proof is straightforward (see, e.g., 
Henneaux and Teitelboim (1992)). In particular, 
the proof of [10] is a standard spectral sequence 
argument with a sequence that collapses after the 
second step. It is interesting to note that, contrary 
to so, which is only a differential modulo 6,s is a 
true differential. The construction of s provides a 
model for H*(so,Ho(6,C)). The differential s is not 
unique, but this does not affect the subsequent 
discussion. 
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In physical applications, the total ghost number is 
a derived quantity. The primary gradings are the 
resolution degree and the "filtration degree" called 
the pure ghost number and denoted pgh. It is an 
N-degree and one has 


gh = pgh — r [11] 


The r-degree is known as the antighost or antifield 
number, depending on the context (see below). 
When r(x)=0, one has gh(x)=pgh(x). Since the 
pure ghost number is non-negative, this implies that 


H*(s,C)=0, k«0 [12] 


A Geometric Application 
Geometric Setting 


Theorem 1 is relevant to the following situation. 
Consider a surface X in a manifold M, defined by 
equations 


f=0 [13] 


which may or may not be independent. (We assume 
for definiteness that the variables in M are bosonic, 
that is, that M is an ordinary manifold — as opposed 
to a supermanifold. The graded case can be covered 
without difficulty by including appropriate sign 
factors at the relevant places.) Assume that X is 
partitioned by orbits generated by vector fields X。 
defined everywhere in M, tangent to X and closing 
on X in the Lie bracket, 


[Xa, X5] = CYagX, + “more” [14] 


where *more" denotes terms that vanish on X. We 
assume, for simplicity, that the vector fields Xa are 
linearly independent of X, although this is not 
necessary. The formalism can be developed in the 
nonindependent case, but it then requires more vari- 
ables. We are interested in the quotient space X/O of 
the surface X by the orbits. To guide the geometrical 
intuition, we shall assume that this quotient space is a 
smooth manifold (the fiber of the orbits, etc.), and we 
shall suggestively adopt notations adapted to this best 
possible case. The approach, being purely algebraic, is 
in fact more general. (Accordingly, the notations 
should be understood with a liberal mind.) 

The aim here is to describe the algebra of 
“observables,” that is, the algebra C^(X/O) of 
functions on the quotient space X/Ó. The terminology 
“observables” anticipates the physical situation dis- 
cussed below, where the orbits are the “gauge orbits.” 
In order to describe algebraically the algebra of 
observables, one observes that this algebra is obtained 
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through a two-step procedure. First, one restricts the 
functions from M to X. Second, one imposes the 
invariance condition along the orbits. To each of these 
steps corresponds a separate differential. 


Longitudinal Complex 


The longitudinal complex is associated with the 
second step. One can consider on X an “exterior 
derivative operator D along the gauge orbits." This 
operator is defined on functions on X as 


Df = Xa(f)C [15] 


where the 1-forms C" dual to the X’s are called 
ghosts. In the physical context, the form-degree is 
the pgh described earlier, and so pgh(C?) — 1. The 
action of D on the ghosts is given by 


DE = C7 pCO [16] 


The longitudinal complex £s is the complex of 
exterior forms along the gauge orbits. In our 
representation used here, it is given by the space 
of polynomials in the ghosts C® with coefficients 
that are functions on X. The exterior derivative D 
is defined on this space by extending the formulas 
[15] and [16] so that it is an odd derivation. One 
clearly has (on X) 


D? =0 [17] 


The functions on the quotient space X/Ó are just the 
elements of the zeroth cohomological group 
H*(D, Ly), 


H?’ (D, Lz) 2.C*(3/0) [18] 
In general, H^(D, Ls) Æ 0. 
Koszul-Tate Differential 6 


The Koszul-Tate differential 6 implements the first 
step in the reduction procedure. More precisely, it 
provides an algebraic resolution of the algebra 
C^*(X) of the smooth functions on the surface X. 

That algebra can be identified with the quotient 
algebra 


C*(X)- C*(M)/N 19] 


where NM is the ideal of functions that vanish on X. 
The Koszul-Tate complex K is defined by adding 
one new generator for each equation f, — 0 defining 
£, denoted £7 and assigned r-degree 1. In the algebra 
C**(M) & ^(t;) (where ^(t7) is the exterior algebra 
on £*), one defines ó through 
óf 2-0 Vf e C*(M), Dr. =f, [20] 


and extends it as an odd derivation. It is clear 
that 7(6)— —1 and that 6*=0. Because the 


functions on M are annihilated by ô, they are 
clearly cycles at r-degree zero. Because the left- 
hand side f, of the equations fa=0 are exact 
(equal to 6£7), the ideal M coincides with the set 
of boundaries in degree zero. 

Thus, 


Ho(6, K) = C*(X) [21] 


We see accordingly that 6 successfully enforces the 
restriction to the surface X through its homology in 
degree zero. 

However, if the equations fs — 0 are not indepen- 
dent, this is not the end of the story. Indeed, any 
identity Z4/; —0 on the functions fa leads to a 
nontrivial cycle Z4t* in r-degree 1, 6(Z4t*) — 0. This 
is undesirable. To cure this drawback, one intro- 
duces further generators t4 in r-degree 2, one for 
each identity Z4f, — 0, and defines 

b= Zita tt) =2 [22] 
in order to “kill” the unwanted cycles Z4t*. The 
Koszul complex K is thus enlarged to contain these 
new (even) variables and redefined as 


K = C*(M) & A(t*) & S(t) [23] 


where S(t,) is the symmetric algebra in £4. The 
operator ó is extended to K as an odd derivation. 
One has 6^ — 0 and the property [21] is unaffected 
by the inclusion of the new generators. Furthermore, 
by construction, 


E] 


If there is no "identity on the identities," we shall 
assume that the process stops. Otherwise, one needs 
to introduce further generators in r-degree 3 and 
possibly higher. When all the appropriate variables 
are included, there is no homology at higher 
r-degree. Thus, 


H,(6,K)=0, k»0 [25] 


Combining 6 with D 


We now turn to the problem of combining the 
Koszul-Tate complex with the longitudinal com- 
plex, so as to implement the full reduction. To that 
end, we define C by adding the ghosts to K, 


C ud eNC-0O [26] 


We then extend the action of the Koszul-Tate 
differential in the simplest way which preserves all 
gradings, namely 


$0, 0 [27] 


It is clear that the homology of 6 in C is given by 
Ho(6,C) — £y,  Hy(6,C)=0 (k»0) [28] 


One can also extend the longitudinal derivative 
D to the whole complex C because the vector fields 
X, are defined throughout M and so, the defini- 
tions [15] and [16] make sense in C. One defines 
the action of D on the generators t* by requiring 
that 


D6 4-6D — 0 [29] 


This is easily verified to be possible. However, the 
(odd) derivation so obtained fails to be a differential 
in C when the vector fields X, do not close off the 
surface X. In that case, the gauge transformations 
are not integrable off X; one says that they form an 
“open algebra." One has then D? — 0 only on ¥, or, 
more precisely, 


D? = —6s — 516 [30] 


for some (odd) derivation s, (that vanishes in the 
"closed algebra" case). But this situation is precisely 
the one discussed earlier, with the Koszul-Tate 
differential being indeed 6, as anticipated by the 
notation, and the longitudinal differential D playing 
the role of so (the degrees also match). Applying the 
theorem discussed there, we can conclude: 


Theorem 2 There exists a differential s in C, 
s=6+D+5,+4+-:-, s^ -—0 [31] 
such that 


H°(s,C) = C*(X/O) [32] 


This is an immediate consequence of Theorem 1 
and eqns [18] and [28]. The differential s is known 
in the physical applications described below as the 
BRST differential. 


Hamiltonian BRST Construction 


As a first application of the above setting, we 
consider the Hamiltonian description of gauge 
systems. As already known, gauge systems are 
characterized in the Hamiltonian description by 
constraints and, for this reason, are called *con- 
strained Hamiltonian systems." Furthermore, the 
gauge transformations generate gauge orbits on the 
constraint surface and the physical observables are 
the functions on the quotient space of the constraint 
surface by the gauge orbits. 

A further important feature arises in the Hamilto- 
nian formalism: the gauge transformations are 
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canonical transformations that are generated by the 
first-class constraints. Assuming that all the second- 
class constraints have been eliminated and that the 
bracket being used is the Dirac bracket, one sees 
that there is a vector field X, for each constraint 
function f;,@=a. (The functions f, are thus 
assumed to be independent since the vector fields 
Xa are assumed to be so. If not, further variables are 
needed, but the analysis proceeds along the same 
ideas.) 

This implies, in turn, that there is a pairing between 
the ghosts C? associated with the longitudinal exterior 
derivative and the generators t* of the Koszul-Tate 
complex. This pairing enables one to extend the 
bracket structure defined on the phase space to the 
pairs (C*,t*) by declaring that these are canonically 
conjugate. The variables 7; are the momenta conjugate 
to the ghosts, [t*,C^] = 6°. Accordingly, the complex C 
relevant to the Hamiltonian situation, 


C = C*(P) @A(C*) ^ (t?) [33] 


has a phase-space structure (here, P — M is the 
manifold obtained after eliminating the second-class 
constraints, equipped with the Dirac bracket). The 
space C is known as the “extended phase space." 
The r-degree is called *antighost number" in the 
Hamiltonian context. 

By the general theorem described in the previous 
section, one knows that the cohomology at gh — 0 of 
the BRST differential is isomorphic to the algebra of 
the observables. Thus, there are two alternative 
ways to describe this physical algebra, either 
through reduction, by eliminating the redundant 
(gauge) variables, or cohomologically in an extended 
space containing additional variables, the ghosts, 
and their momenta. 

There is an additional interesting feature of the 
BRST construction in the Hamiltonian case: the 
BRST transformation is a canonical transformation 
in the extended phase space, in the sense that 


sF = (©, F] [34] 


for some “BRST generator" 2 of ghost number 1 
(F,Q € C). The nilpotency s? of the BRST differen- 


tial is equivalent to 
(2, 0] — 0 [35] 


That s is canonically generated implies that the 
cohomological BRST groups come with a natural 
bracket structure: the Poisson bracket of the extended 
phase space passes on to the BRST cohomological 
groups. In particular, H?(s,C), equipped with this 
bracket structure, is isomorphic (as Poisson algebra) 
to the algebra of physical observables. 
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Lagrangian BRST Construction 


The analysis of the Lagrangian BRST construc- 
tion, due to Batalin and Vilkovisky (1981) (*anti- 
field formalism"), proceeds in the same way because 
the covariant description of the space of observables 
involves also the same geometric ingredients. The 
surface X is now the "stationary surface," that is, 
the space of solutions to the equations of motion. 
The space M in which it is embedded is the space of 
all field histories. The gauge symmetry acts on this 
space. Furthermore, the gauge vector fields are 
tangent to X since a solution is mapped on a 
solution by a gauge transformation. The integral 
submanifolds are the gauge orbits. The observables 
are the functions on the quotient space. 

Since the equations of motion follow from an 
action principle, there are as many equations as 
there are fields y'. The corresponding generators t? 
in the Koszul-Tate complex (at degree 1) are called 
*antifields conjugate to the fields" and are denoted 
y*. The r-degree is known as “antifield” (or also 
*antighost") number. The gauge symmetry of the 
action implies Noether identities on the equations of 
motion. These are, therefore, not independent. 
According to the above general discussion, there 
are further generators in the Koszul-Tate complex, 
at degree 2. More precisely, there are as many new 
generators in degree 2 as there are Noether identities 
or independent gauge symmetries. These are called 
antifields conjugate to the ghosts and denoted C*. 

In the longitudinal complex, one has the ghosts C^, 
with as many ghosts as there are gauge symmetries. 
Thus, the BRST complex is the space 


C= CM) OAC) SND SC) [36 


where M is the space of all field histories. There is 
now a natural pairing between the original field 
variables v! and the antifields y*, as well as between 
the ghosts C“ and the antifields C7. One thus defines 
a bracket in which the fields y’ and the ghosts C^ on 
the one hand, and the antifields y* and C7, on the 
other, are declared to be conjugate. This bracket is 
denoted by parentheses, 
w=  (cco-5 
However, since the bracket pairs variables with 
degrees that add up to —1, it is in fact an “odd 
bracket," called the *antibracket." 
The BRST differential is again canonically gener- 
ated, but this time in the antibracket, 


sF = (S,F), FEC [38] 


where the generator $ is an even function of the 
fields, the ghosts and the antifields, with gh — 0 (the 


ghost number is carried by the odd antibracket). 
The nilpotency s*=0 of the BRST differential is 
equivalent to the crucial *master equation," 


(5,5) 20 [39] 


Because the BRST differential is canonically 
generated, there is a natural bracket in cohomology. 
This bracket is not the Poisson bracket of observa- 
bles (at gh =0) because it changes the ghost number 
by one unit. One can, however, relate it to the 
Poisson bracket of observables (Barnich and Hen- 
neaux 1996); furthermore, it plays an important role 
in the study of the consistent deformations of the 
action. 


Spacetime Locality 


In the context of local field theory, one is often 
interested in a particular class of functions of the 
field histories, namely the so-called space of local 
functionals. A local functional is, by definition, the 
integral of a local »-form (where n is the spacetime 
dimension). A local n-form reads, in local 
coordinates， 


ij sm (x) dx [40] 


where f (x) depends on the fields at x as well as on a 
finite number of their derivatives. When the ghosts 
and the antifields are included, the local functions 
depend on them in the same way. 

The previous general cohomological result was 
derived in the space of all function(al)s, without locality 
restriction. When changing the space of cochains, one 
may change the cohomology. For instance, a local 
functional which is BRST-trivial in the space of all 
functionals may become nontrivial in the space of local 
functionals. This indeed happens here because the 
homology of the Koszul-Tate differentials usually no 
longer vanishes at strictly positive r-degree in the space 
of local functionals, where it is related to local 
conservation laws. As a result, the analysis of the 
BRST cohomology in the space of local functionals is 
an interesting and nontrivial problem. In particular, the 
cohomological groups H*(s) in the space of local 
functionals may not vanish at negative ghost numbers. 


BRST Quantization 


The quantization of a dynamical system can proceed 
along different lines. For gauge models, the path- 
integral approach is most efficiently pursued in the 
context of the antifield formalism. We shall briefly 
outline here the general principles underlying the 


operator approach, which is based on the Hamiltonian 
formalism. 

In the operator approach, all the variables, 
including the ghosts and the conjugate momenta, 
are realized as operators in a space endowed with a 
nonpositive-definite inner product (because of the 
ghosts and the gauge modes). Real dynamical 
variables become formally Hermitian operators. 
Ignoring anomalies, the BRST generator Q becomes 
an operator that fulfills the conditions 


T=), 2=0 [41] 


(which allows for nontrivial solutions 2 4 0 because 
the inner product is not positive definite). The 
second relation is a consequence of the classical 
Poisson bracket relation [2Q,Q]=0 and the fact that 
the graded Poisson bracket of two odd objects 
becomes the anticommutator. 

To remove the ghost and gauge redundancy, which 
has no physical content, one must impose a condition 
that selects physical states. The appropriate condition 
is motivated by the general cohomological result 
connecting the BRST cohomology with the algebra of 
physical observables. One imposes the condition 


Ay) =0 [42] 


Because of [41], states of the form O|x) are solutions 
of [42], but they have a vanishing inner product with 
any other physical states, including themselves. They 
are called null states. The physical states are given by 
the BRST state cohomology. The physical operators 
are given by the BRST operator cohomology at 
gh — 0 and induce a well-defined action in the state 
cohomology. In particular, the Hamiltonian, being 
gauge invariant in the original theory, is represented 
by a BRST cohomological class, so that the time 
evolution maps physical states on physical states. 
The whole scheme is (formally) consistent because 
exact BRST operators have vanishing matrix elements 
between states annihilated by the BRST -operator Q, 
while null states |ó) are such that (1/|A|ó) — 0 whenever 
A is a BRST-closed operator, [A, O] 2 0, and |v) a 
physical state. Problems may arise, however, if the 
classical relations [O, O| 20 and [H,Q]=0 are not 
satisfied in presence of extra terms of order 方 ,that is, 


+40 or HN+NHF0 [43] 


In such cases, one says that they are anomalies. These 
are usually fatal to the consistency of the theory. 


Some Applications 


The number of applications of the BRST formalism 
is so large that it would be out of place to try being 
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exhaustive here. Some of its main successes are 
outlined here, with suggestions for “Further reading.” 


Renormalization of Gauge Theories 


First, there is the original context of perturbative 
renormalization and anomalies for gauge theories of 
the Yang-Mills type. The relevant cohomology here 
is the BRST cohomology in the space of local 
functionals involving the fields, the ghosts, and the 
antifields. The antifields are also known in this 
context as Zinn-Justin sources for the BRST varia- 
tions of the fields and ghosts, since Zinn-Justin was 
the first to introduce them (with that meaning). 
Many authors have contributed to the full computa- 
tion of the local BRST cohomology. A review is 
given in Barnich et al. (2000), where extensions to 
other theories are also indicated. 


String Theory 


Modern string theory would be inconceivable with- 
out the BRST formalism. This started with the 
pioneering paper by Kato and Ogawa (1983), where 
the critical dimension of the bosonic string was 
derived from the condition that €? should vanish 
(quantum mechanically), and where it was shown 
that the string physical states could be identified 
with the state BRST cohomology. The reader is 
referred to excellent monographs on modern string 
theory (see *Further reading"). 


Deformations of Gauge Models 


The study of consistent deformations of a given 
gauge theory (ie. the problem of introducing 
consistent couplings) is also efficiently dealt with in 
the BRST context. References to applications may 
be found in Henneaux (1998). 


See also: Anomalies; Batalin-Vilkovisky Quantization; 
BF Theories; Constrained Systems; Functional 
Integration in Quantum Physics; Graded Poisson 
Algebras; Indefinite Metric; Perturbative Renormalization 
Theory and BRST; Quantum Chromodynamics; Quantum 
Field Theory: A Brief Introduction; Renormalization: 
General Theory; String Field Theory; Supermanifolds; 
Topological Sigma Models. 
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The study of algebras of Hilbert space operators, closed 
under the adjoint operation and in the weak operator 
topology, was begun by John von Neumann shortly 
after the discovery of quantum mechanics, and partly 
with the aim of understanding the monolithic ideas 
proposed by Heisenberg and Schrödinger. 

Seventy-five years later, the theory of these 
algebras has become a monolith in its own right 
(see von Neumann Algebras: Introduction, Modular 
Theory and Classification Theory; von Neumann 
Algebras: Subfactor Theory), with more internal 
structure and with more external reference to physics 
and, as it turns out, to other areas of mathematics 
than could possibly have been imagined at the outset. 
(The most striking example of an application to 
mathematics is perhaps the discovery of the Jones 
knot polynomial (see The Jones Polynomial); note 
that this has also had repercussions for physics.) 

Twenty-five years after the beginning of the 
theory of von Neumann algebras, as these algebras 
are now called, Gelfand and Naimark noticed that a 
second class of algebras of operators on a Hilbert 
space, closed under the adjoint operation, was 
worthy of study, namely those closed in the norm 
topology. Gelfand and Naimark made two impor- 
tant discoveries concerning this class-of operator 
algebras, now called C*-algebras. 

First, Gelfand and Naimark showed that, in the 
commutative case, at least when the C*-algebra is 
considered only up to isomorphism - with its 
identity as a concrete algebra of operators sup- 
pressed — the information contained in a C*-algebra 
is purely topological. More precisely, Gelfand and 
Naimark showed that the category of unital 
commutative C*-algebras, with unit-preserving 
algebra homomorphisms (these necessarily preserve 
the adjoint operation), is equivalent in a contra- 
variant way (i.e., with reversal of arrows) to the 
category of compact Hausdorff spaces, with con- 
tinuous maps. The compact space associated with a 
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unital commutative C*-algebra under the Gelfand- 
Naimark correspondence may be viewed as the 
space of maximal proper ideals, with a natural 
topology (the hull-kernel, or Jacobson, topology), 
and is called the spectrum. This space may also be 
viewed as the set of (unital, linear, multiplicative) 
maps from the algebra into the complex numbers, 
in which case the topology is that of pointwise 
convergence. 

Second, using this result, Gelfand and Naimark 
proved that arbitrary C'-algebras could be axioma- 
tized in a simple way abstractly, as *-algebras — that 
is, as algebras over the complex numbers with a 
conjugate linear anti-automorphism of order 2 — with 
certain special properties. It is now known that the 
only property that needs to be assumed is the 
existence of a (necessarily unique) Banach space 
norm related to the *-algebra structure by means of 
the so-called C*-algebra identity: 


Ix x] = Me el [1] 


This is clearly related to — and in fact implies — the 
normed algebra inequality 


lx yll < llxl| llyl| [2] 


One reason that the Gelfand—Naimark axiomati- 
zation of C*-algebras is important is that it under- 
lines how natural it is to consider a C*-algebra 
abstractly, i.e., independently of any particular 
representation. Indeed, while one of the fundamen- 
tal phenomena of von Neumann algebra theory 
(discovered by Murray and von Neumann) is that, 
essentially — in rather a strong sense — there is only 
one way to represent a given von Neumann algebra 
on a Hilbert space (and there is even a canonical 
way, called the standard representation!), it is an 
equally fundamental phenomenon of C*-algebra 
theory that, except in extremely special cases, this 
is no longer true. 

For instance, although the C*-algebra of compact 
operators on a given Hilbert space has, up to unitary 
equivalence, only a single irreducible representation — 
this is what underlies the fact, proved by von 
Neumann, referred to as the uniqueness of the 
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Heisenberg commutation relations for a quantum- 
mechanical system with finitely many degrees of 
freedom — as soon as one considers a physical system 
with infinitely many degrees of freedom, one finds that 
the naturally associated C*-algebra has infinitely 
many — indeed, uncountably many - unitary equiva- 
lence classes of irreducible representations, and it is 
impossible to parametrize these in any reasonable way. 

This striking dichotomy presents itself also in 
other contexts, more elementary perhaps than the 
physics of infinitely many degrees of freedom. 
Consider the dynamical system consisting of a circle 
and a fixed rotation acting on it. If the rotation is of 
finite order — i.e., if the angle is a rational multiple 
of 27 — then the naturally associated C*-algebra is 
relatively easy to study. In the case of angle zero, it 
is the unital commutative C*-algebra with Gelfand- 
Naimark spectrum the torus. In the general case of a 
rational angle, the space of unitary equivalence 
classes of irreducible representations is still naturally 
parametrized by the torus. (And this is the same as 
the space of primitive ideals — the kernels of the 
irreducible representations — with the Jacobson 
topology.) 

In the irrational case — the case of a rotation by an 
irrational multiple of 27 (still elementary from a 
geometrical point of view; note that the calendar is 
based on such a system!) — the irreducible represen- 
tations are no longer parametrized up to unitary 
equivalence by the torus — and the space of primitive 
ideals consists of a single point — the C*-algebra is 
simple. (But it is decidedly not simple to study!) 

This fundamental dichotomy in the classification 
of C*-algebras - conjectured by Gaarding and 
Wightman in the quantum-mechanical setting and 
by Mackey in the geometrical one — was established 
by Glimm. Glimm proved (in the setting of separ- 
ability; most of his results were generalized later 
to the nonseparable case) that a large number of 
a priori different ways that a C*-algebra could 
behave well were in fact one and the same behavior: 
either all present for a given C*-algebra, or all 
catastrophically absent! 

Some of the properties considered by Glimm, and 
shown to be equivalent (for a separable C*-algebra) 
were as follows. First of all, every representation of 
the C*-algebra on a Hilbert space should be of type 
I, i.e., should generate a von Neumann algebra of 
type I. (A von Neumann algebra was said by Murray 
and von Neumann to be of type I if it contained a 
minimal projection of central support one, i.e., a 
projection not contained in a proper direct sum- 
mand and minimal with this property.) Second, in 
every irreducible representation (not necessarily 
injective) on a Hilbert space, the image of the 


C*-algebra should contain the compact operators. 
Third, any two irreducible representations with the 
same kernel should be unitarily equivalent. Fourth, 
it should be possible to parametrize the unitary 
equivalence classes of irreducible representations by 
a real number in a natural way (respecting the 
natural Borel structure introduced by Mackey). 

The first of the equivalent properties listed above, 
that all representations of a C*-algebra should be of 
type I, suggested a name for the property - that the 
C*-algebra itself should be of type I. This property 
of a C*-algebra, identified by Glimm - or, rather, its 
opposite, which as mentioned above is much more 
common (just as irrational numbers are more 
common than rationals, or systems with infinitely 
many degrees of freedom are, at least in theory, 
much more common than those with finitely many 
degrees of freedom) - is a fundamental unifying 
principle of nature. 

Besides commutative C*-algebras — as mentioned 
above, just another way of looking at topological 
spaces (compact Hausdorff spaces, that is) — and 
besides the C*-algebra associated to a rotation or to 
a physical system with infinitely many degrees of 
freedom, what are some of the naturally occurring 
examples of C*-algebras — of type I or not! 

First, let us take a closer look at what arises from 
a system with infinitely many degrees of freedom - 
in the fermion case. As shown by Jordan and 
Wigner, one obtains what, as a C*-algebra, is very 
easy to describe, namely, just the infinite tensor 
product in the category of unital C*-algebras of 
copies of the algebra of 2 x 2 matrices over the 
complex numbers. As it happens, in work earlier 
than that referred to above, Glimm had considered 
such infinite tensor product C*-algebras, also allow- 
ing the components to be matrix algebras of order 
different from two. This raised a problem of 
classification — for those C*-algebras, all of which 
were simple and not of type I. (The only simple 
unital C*-algebra of type I is a single matrix algebra, 
or a finite tensor product of matrix algebras!) 

In a pioneering classification paper (the first paper 
on the classification of C*-algebras being perhaps 
that of Gelfand and Naimark, in which the commu- 
tative case was described), Glimm obtained the 
classification of infinite tensor products of matrix 
algebras, showing that it was a direct extension of 
the classification of finite tensor products, i.e., just 
of the matrix algebras themselves. As described later 
by Dixmier, Glimm's classification was as follows. 
Given a sequence 7,,72,... of natural numbers 
(equal to one or more), form the infinite product in 
a natural way - just by keeping track of the total 
number of times each prime number appears in the 


finite products nı ...71 (a multiplicity which may be 
either finite or infinite). Call such a formal infinite 
product a generalized integer — or, perhaps, a 
supernatural number! Two (countably) infinite 
tensor products of matrix algebras are isomorphic 
(just as in the finite tensor product case) if and only 
if the corresponding supernatural numbers are 
equal. 

In formulating Glimm's classification of infinite 
tensor products of matrix algebras in this way, 
Dixmier pointed out that each supernatural number 
determines a subgroup of the rational numbers 
(those with denominator dividing the supernatural 
number) and that every subgroup of the rational 
numbers containing the integers arises in this way. 
He then gave an alternative derivation of Glimm's 
theorem by recovering this subgroup of the rational 
numbers as a natural invariant of the algebra, 
namely, as the subgroup generated by the values 
on projections of the unique normalized trace. (By a 
trace is meant here a unitarily invariant positive 
linear functional.) This could even be interpreted as 
an alternative statement of Glimm's theorem. 

Soon afterwards, Bratteli considered an extension 
of Glimm's class of C'-algebras, namely, the 
inductive limits of arbitrary sequences of finite- 
dimensional C*-algebras, and gave a classification of 
these algebras in terms of the embedding multiplicity 
data in the sequences. This was exactly analogous to 
the original classification of Glimm, but now vastly 
more complex, with the multiplicity data of the 
sequence encoded in what is now called a Bratteli 
diagram. (Note that a finite-dimensional C*-algebra 
is just a direct sum of matrix algebras over the 
complex numbers.) Bratteli diagrams have proved to 
be very important, and in particular have been shown 
by Putnam and others to be useful for the study of 
minimal homeomorphisms of the Cantor set. 

Bratteli’s extension of Glimm's tensor product 
classification was followed by a corresponding 
extension by the present author of Dixmier's 
approach to Glimm's result. It was no longer 
possible to express the appropriate data in terms of 
traces (even in the case of a unique normalized 
trace). Instead, the present author recalled the 
concept of equivalence of projections introduced 
by Murray and von Neumann forty years earlier, 
together with the fact, proved by Murray and von 
Neumann, that equivalence is compatible with 
addition of orthogonal projections. (Two projec- 
tions in a *-algebra are equivalent if they are equal 
to x'x and xx* for some element x.) The resulting 
elementary invariant — the set of equivalence classes 
of projections with the operation of addition 
whenever defined (whenever the equivalence classes 
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to be added have orthogonal representatives) — one 
might refer to this as a local abelian semigroup - 
which was used by Murray and von Neumann to 
divide von Neumann algebras into what they called 
types I, I, and III ~ was shown by the author to 
determine Bratteli's algebras up to isomorphism. 

Bratteli called his algebras approximately finite- 
dimensional C*-algebras, or AF algebras. The author 
referred to his invariant simply as the range of the 
(abstract) dimension, and pointed out that this 
structure determined an enveloping ordered abelian 
group, which he called the dimension group. It was 
soon noticed that the dimension group was related 
to the K-group introduced by Grothendieck in 
algebraic geometry (see K-Theory), and by Atiyah 
and Hirzebruch (see K-Theory) in topology. 

Grothendieck's K-group was defined for an arbi- 
trary ring with unit, and Atiyah and Hirzebruch in 
effect considered the special case of the ring of 
continuous functions on a compact Hausdorff space — 
in other words, a commutative C*-algebra — in the 
process showing that the deep phenomenon of Bott 
periodicity could be expressed in terms of this 
invariant. The invariant itself (see below) is essen- 
tially the same as that of Murray and von Neumann. 
In the special case that the ring is an AF algebra, the 
K-group coincides with the dimension group. (The 
K-group has a natural ordered, or pre-ordered, 
structure, although this was often suppressed.) 

Let us consider the definition of the K-group of a 
not necessarily unital C*-algebra; it is in this setting 
that the statement of Bott periodicity attains its 
simplest form. 

First, in the unital case, one constructs the abelian 
local semigroup (addition just partially defined) of 
Murray-von Neumann equivalence classes of pro- 
jections, as described above in the case of an AF 
algebra. Let us call this the dimension range. As 
stated above, for AF algebras this is all that needs to 
be done - the enveloping group of the dimension 
range is already the K-group. In the general case, 
one must repeat the construction for the algebra of 
2 x 2 matrices over the given algebra, with the given 
algebra considered as embedded as the upper left- 
hand corner of the matrix algebra. The dimension 
range of the given algebra then maps naturally into 
(but not necessarily onto) the dimension range of the 
matrix algebra. One should then repeat this con- 
struction, doubling the order of the matrix algebra 
at every stage (or, alternatively, increasing it just by 
one). The enveloping group of the (algebraic) 
inductive limit of this sequence of local semigroups 
is then the K-group of the given algebra. (Alterna- 
tively, one may just consider immediately the 
*-algebra of all infinite matrices over the given 
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C*-algebra with only finitely many nonzero entries, 
and form the dimension range of this *-algebra — and 
the enveloping group of this abelian local semi- 
group, now in fact a semigroup.) 

In the case of a nonunital C*-algebra, one adjoins 
a unit (as may be done, for instance, by representing 
the C*-algebra faithfully on a Hilbert space, and 
showing that the C*-algebra obtained by adjoining 
the identity operator is independent of the representa- 
tion — actually, one need only check that the *-algebra 
structure is unique, as the C*-algebra norm on a 
C*-algebra is always determined by the *-algebra 
structure). The K-group of the resulting unital 
C*-algebra then maps naturally into the K-group of 
the natural one-dimensional quotient, and the kernel 
of this map is, for reasons that will become clearer 
later, defined to be the K-group of the nonunital 
algebra. 

Atiyah and Hirzebruch in fact referred to the 
K-group of the C*-algebra as Ko - the reason being 
that there is another very natural group to consider, 
namely, the K-group of the suspension of the 
C*-algebra. (The suspension, SA, of a C*-algebra A 
is defined as the C'-algebra of all continuous 
functions from the real line R into A which converge 
to zero at +00, with the pointwise *-algebra 
operations and the supremum norm. It may also be 
defined as the (unique) C*-algebra tensor product 
A & Co(R), where Co(R) denotes the suspension of 
the C'-algebra C of complex numbers.) Denoting 
the Ko-group of the suspension of a given C*-algebra 
by Kı, one might expect this process to continue, 
but in fact it is periodic (Ko, K1, Ko, Ki,...). Bott 
periodicity states that there is a natural isomorphism 
of Ky with Ko. (C*-algebras can also be defined with 
the field of real numbers as scalars, and in this case 
the period of Bott periodicity is eight.) 

Another way of stating Bott periodicity, or, more 
precisely, of embedding it into the K-theory of 
C*-algebras, is as follows. Given a short exact 
sequence of C*-algebras, 


0—5J— A A/J —^0 [3] 


ie. given a C*-algebra A and a closed two-sided 
ideal J (the quotient *-algebra is then a C*-algebra 
with the quotient norm) — A is sometimes referred to 
as an extension of J by A/J — consider the natural 
short (not necessarily exact) sequences 


Ko(J) > Ko(A) > Ko(A/J) [4] 
and 
Ki (J) ^ Ki(A) ^ Ki (A/J) [5] 


(Ko and K are functors!). There exist natural connect- 
ing maps K1(A//) — KoQ) and Ko(A/J) — Ki1(J) - the 


first referred to as the index map, and the second 
(sometimes referred to as the odd-order index map) 
obtained from this immediately from Bott periodicity 
(as stated above) — such that the periodic six-term 
sequence 


Ko(J) ^ Ko(A) > Ko(A/J) 


T l 
Ki(A/J) — Ki(A) — Ki(J) 


is exact. (The periodicity stated above can also be 
recovered from this.) 

Given that the functor Ko classifies AF algebras, 
one might expect the functor Kı to be useful for 
classification purposes also. In fact, this is the case. 
(Indeed, as shown by Brown, the Ki;-functor is 
already important for the theory of AF algebras - in 
spite of, or even because of (!), the fact that the 
Ki-group of an AF algebra is zero.) Using the six- 
term exact sequence of Bott periodicity described 
above, corresponding to an extension of C*-algebras, 
together with results of the present author, Brown 
showed that any extension of one AF algebra by 
another is again an AF algebra. 

A rather large class of simple unital C*-algebras 
has by now been classified by means of the 
invariants Ko and Kı — together with the class of 
the unit in Ko, and the order (or pre-order) structure 
on Kg — and also taking into account the compact 
convex set of tracial states on the C*-algebra 
(a positive linear functional on a C*-algebra is called 
a trace if it has the same value on x* x and xx* for 
every element x, and a tracial state if it is a state, 
that is, has norm 1, or has value 1 on the unit in the 
case the algebra has a unit). In addition to the set of 
tracial states, together with its natural topology and 
convex structure, one should also keep track of the 
natural pairing between traces and Kg (any trace on 
a unital C'-algebra has the same value on two 
equivalent projections — equal to x*x and xx* for 
some element x — and hence gives rise to an additive 
real-valued functional on Ko). 

In terms of these invariants (which might, broadly 
speaking, be called K-theoretical), it has been 
possible to classify the simple unital C*-algebras 
(not of type I) arising as inductive limits (1.e., as the 
completions of increasing unions) of sequences of 
finite direct sums of matrix algebras over separable 
commutative C*-algebras, these assumed to have 
spectra of dimension at most three, on the one hand 
(work of the present author together with Guihua 
Gong and Liangqing Li, a culmination of earlier 
work of these authors together with a number of 
others), and, on the other hand, it has been possible 
(work of Kirchberg and Phillips, also based on 
earlier work by a number of authors) to classify the 


C'-Algebras and their Classification 397 


C*-algebra tensor products (in a natural sense) of 
these C'-algebras with what is called the Cuntz 
C*-algebra Oœ (see below). In the first of these two 
cases, the compact convex set of tracial states — 
always a Choquet simplex — is an arbitrary (metriz- 
able) such space. 

In the second case, this space is empty (as it is for 
Oæ in particular). In both cases, Ko and Kı are 
arbitrary countable abelian groups, with the proviso 
that Ko is not the sum of a torsion group and a 
cyclic group. In the first case, the order structure on 
Ko, the class of the unit element, and the pairing of 
Ko with the space of traces have certain special 
properties; as it turns out, these can be expressed in 
a simple way. (The class of the unit need only be 
positive and nonzero.) In the second case, the order 
structure on Ko is degenerate — every element is 
positive — and the class of the unit can be arbitrary 
(including zero!). 

Let us just note that the Cuntz C*-algebra O» is 
the unital C'-algebra generated by an infinite 
sequence s;,s5,... Of isometries with orthogonal 
ranges (in other words, elements s; such that sy s; is 
the unit and s; s; — 0 if j 7 i). One need not require 
the C*-algebra to have the universal property with 
respect to these generators and relations as it is in 
fact unique (up to an isomorphism preserving these 
generators). In particular, this C*-algebra is simple. 
(If one considers a finite sequence of isometries with 
orthogonal ranges, and assumes in addition that the 
sum of these is the unit, one also obtains a simple 
C*-algebra, the Cuntz C*-algebra O,, n —2,3,.. .). 
The Ko-group and K;-group of Ow are, respectively, 
Z and 0. (The Ko-group and K;-groups of O, for 
n — 2,3,... are, respectively, Z/(n — 1)Z and 0.) 

Both classes of C*-algebras considered in the 
classification result stated above, although des- 
cribed in rather a concrete way (in terms of 
inductive limits and tensor products), can also be 
characterized axiomatically, in a way that makes it 
clear that they are, in fact, much-more general than 
they seem. (These axiomatizations are due to 
Lin and to Kirchberg and Phillips. Typically, the 
abstract axioms are easier to establish in a 
given case than the inductive limit form described 
above.) 

In view of this, and the fact that one of the axioms 
is a notion of amenability (the analogous property 
for C*-algebras of a notion that has also been 
considered for von Neumann algebras) and since 
amenable von Neumann algebras (on a separable 
Hilbert space) have been classified completely (in 
remarkable work of Connes, together with many 
others, starting with Murray and von Neumann — 
and, one must also mention, ending with Haagerup, 


who settled a particularly stubborn case), it is 
natural to ask whether the K-theoretical invariants 
described above might be sufficient to classify all 
amenable separable C*-algebras, say, those which 
are simple and unital. 

The work of Villadsen has shown that additional 
invariants must in fact be considered, if one is to 
deal with arbitrary amenable simple C*-algebras, 
and this has been confirmed in subsequent work of 
Rerdam and of Toms. (Villadsen's examples were 
obtained by removing the condition of low dimen- 
sion on the spectra of the commutative C*-algebras 
appearing in the inductive limit decomposition 
considered above.) The very nature of these authors' 
work, however, has been to introduce additional 
invariants, all of which it seems natural to consider 
as, broadly speaking, K-theoretical. (And all of 
which, as it happens, are already familiar.) 

The question of the classifiability, in terms of 
simple invariants (K-theoretical in nature, at least in 
the broad sense, and including the spectrum which is 
indispensable in the nonsimple case), of all (separ- 
able) amenable C'-algebras would therefore still 
appear to be on the agenda. 

Already, in any case, just like the analogous 
question for von Neumann algebras (now settled), 
this question would appear to have had a noticeable 
influence on the development of the subject — not 
least in underlining the importance of K-theoretical 
methods, which have proved to be pertinent both in 
connection with the index theory of differential 
operators on geometrical structures — from foliations 
to fractals — and in connection with questions in 
physics, related to quantum statistical mechanics 
(see e.g., Quantum Hall Effect), to quantum field 
theory (e.g., the standard model), and even to string 
theory and M-theory. 


See also: Axiomatic Quantum Field Theory; Bosons and 
Fermions in External Fields; The Jones Polynomial; 
K-Theory; Positive Maps on C*-Algebras; Quantum Hall 
Effect; von Neumann Algebras: Introduction, Modular 
Theory, and Classification Theory; von Neumann 
Algebras: Subfactor Theory. 
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Calibrated Geometry 


, 


“Calibrated geometry,” introduced by Harvey and 
Lawson (1982), is the study of special classes of 
*minimal submanifolds" N of a Riemannian mani- 
fold (M,g), defined using a closed form 2 on M 
called a calibration. For example, if (M,J,g) is a 
Kahler manifold with Kahler form w, then complex 
k-submanifolds of M are calibrated with respect to 
y=w*/k!. Another important class of calibrated 
submanifolds are special Lagrangian submanifolds 
in Calabi-Yau manifolds, which is the focus of the 
section *Special Lagrangian geometry." 


Calibrations and Calibrated Submanifolds 


We begin by defining “calibrations” and “calibrated 
submanifolds." 


Definition 1 Let (M,g) be a Riemannian manifold. 
An “oriented tangent k-plane" V on M is a vector 
subspace V of some tangent space TyM to M with 
dimV =k, equipped with an orientation. If V is an 
oriented tangent k-plane on M then glv is a 
Euclidean metric on V; so, combining g|y with the 
orientation on V gives a natural volume form voly 
on V, which is a k-form on V. 


Now let y be a closed k-form on M. q is said to 
be a calibration on M, if for every oriented k-plane 
V on M, ply € voly. Here, ply =a-voly for some 
«€R, and wly € voly if & € 1. Let N be an 
oriented submanifold of M with dimension k. Then 
each tangent space TN for x € N is an oriented 
tangent k-plane. We say that N is a calibrated 
submanifold if |. =volr,~ for all x € N. 

It is easy to show that calibrated submanifolds 
are automatically “minimal submanifolds.” We 
prove this in the compact case, but noncompact 
calibrated submanifolds are locally volume-minimizing 
as well. 
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Proposition 2 Let (M,g) be a Riemannian mani- 
fold, p a calibration on M, and N a compact 
y-submanifold in M. Then N is volume-minimizing 
in its bomology- class. 


Proof Let dim N=k, and let [N] € H,(M, R) and 
[p] € H*(M,R) be the homology and cohomology 
classes of N and 2. Then 


由 .= 人 o 


since plr.n=voln.N for each xEN as N is a 
calibrated submanifold. If N' is any other compact 
k-submanifold of M with [N'] 2 [N] in H4(M, R), 
then 


网 四 = 网 NI= |. view < f. volzav 
— Vol(N) - 


xEN 


since |r y < volr,~ because ¢ is a calibration. The 
last two equations give Vol(N) € Vol(N'). Thus, N 
is volume-minimizing in its homology class. 口 


Now let (M,g) be a Riemannian manifold with a 
calibration p, and let ¿:N — M be an immersed 
submanifold. Whether N is a -submanifold 
depends upon the tangent spaces of N. That is, it 
depends on 4 and its first derivative. So, for N to be 
calibrated with respect to y is a first-order partial 
differential equation on ;. But if N is calibrated then 
N is minimal, and for N to be minimal is a second- 
order partial differential equation on L. 

One moral is that the calibrated equations, being 
first order, are often easier to solve than the minimal 
submanifold equations, which are second order. So 
calibrated geometry is a fertile source of examples of 
minimal submanifolds. 


Calibrated Submanifolds and Special Holonomy 


A calibration y on (M,g) is only interesting if there 
exist plenty of g-submanifolds N in M, locally 
or globally. Since |r y =volr,n for each x EN, 
y-submanifolds will be abundant only if the family 
F, of calibrated tangent k-planes V with |y = voly 
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is “reasonably large" — say, if F, has small 
codimension in the family of all tangent k-planes V 
on M. A maximally boring example is the k-form 
o — 0, which is a calibration but has no calibrated 
tangent k-planes, so no g-submanifolds. 

Thus, most calibrations y will have few or no 
y-submanifolds, and only special calibrations w with 
F, large will have interesting calibrated geometries. 
Now the field of Riemannian holonomy groups is a 
natural companion for calibrated geometry, because 
it gives a simple way to generate interesting 
calibrations p which automatically have F, large. 

Let G C O(n) be a possible holonomy group of a 
Riemannian metric. In particular, we can take G to be 
one of the holonomy groups U(»:), SU(m), Sp(m), Go, 
or Spin(7) from Berger's classification. Then G acts 
on the k-forms A^(R")* on R”, so we can look for 
G-invariant k-forms on R". Suppose wo is a nonzero, 
G-invariant k-form on R”. 

By rescaling o we can be arrange that for each 
oriented k-plane U C R”, we have woly € volu, and 
that poly = volu for at least one such U. Let H be the 
stabilizer subgroup of this U in G. Then qo[.;j = 
vol,.y by G-invariance, so y-U is a calibrated 
k-plane for all 4 € G. Thus, the family Fo of 
yo-calibrated k-planes in R” contains G/H, so it is 
“reasonably large,” and it is likely that the calibrated 
submanifolds will have an interesting geometry. 

Now let M be a manifold of dimension z, and g 
a metric on M with Levi-Civita connection V and 
holonomy group G. Then there is a k-form y on M 
with Vy=0, corresponding to yo. Hence dg — 0, 
and y is closed. Also, the condition woly € voly for 
all oriented k-planes U in R” implies that wly < 
voly for all oriented tangent k-planes V in M. Thus, 
y is a calibration on M. The family F, of calibrated 
tangent k-planes on M fibers over M with fiber Fo; 
so, it is “reasonably large." 

This gives a general method for finding interesting 
calibrations on manifolds with reduced holonomy. 
Here are the most significant examples. 


e Let G=U(m) C O(2m). Then G preserves a 
2-form wo on R*”. If g is a metric on M with 
holonomy U(m), then g is Kahler with complex 
structure J, and the 2-form w on M associated to 
wo is the Kahler form of g. 

One can show that w is a calibration on (M, g), 
and the calibrated submanifolds are exactly the 
“holomorphic curves" in (M,/). More generally, 
w*/k! is a calibration on M for 1 € k < m, and 
the corresponding calibrated submanifolds are the 
complex k-dimensional submanifolds of (M, J). 

e Let G=SU(m) c O(2m). Then G preserves a 
complex volume form 0Q9=dz; ^::: ^ dz, on 


C". Thus, a Calabi-Yau m-fold (M,g) with 
Hol(g) =SU(m) has a holomorphic volume form 
Q. The real part Re is a calibration on M, and 
the corresponding calibrated submanifolds are 
called special Lagrangian submanifolds. 

e The group G2 C O(7) preserves a 3-form wo and a 
4-form «yp on R”. Thus, a Riemannian 7-manifold 
(M,g) with holonomy G2 comes with a 3-form q 
and 4-form *y, which are both calibrations. The 
corresponding calibrated submanifolds are called 
associative 3-folds and coassociative 4-folds. 

e The group Spin(7) C O(8) preserves a 4-form Qo 
on R^. Thus a Riemannian 8-manifold (M, g) with 
holonomy Spin(7) has a 4-form Q, which is a 
calibration. The Q-submanifolds are called Cayley 
4-folds. 


It is an important general principle that to each 
calibration y on an »-manifold (M,g) with special 
holonomy constructed in this way, there corre- 
sponds a constant calibration yo on R”. Locally, y- 
submanifolds in M resemble the yo-submanifolds in 
R", and have many of the same properties. Thus, to 
understand the calibrated submanifolds in a mani- 
fold with special holonomy, it is often a good idea to 
start by studying the corresponding calibrated 
submanifolds of R". 

In particular, singularities of y-submanifolds in M 
will be locally modeled on singularities of wo- 
submanifolds in R”. (In the sense of geometric 
measure theory, the tangent cone at a singular point 
of a y-submanifold in M is a conical yo-submanifold 
in R".) So by studying singular yo-submanifolds in 
R”, we may understand the singular behavior of 
y-submanifolds in M. 


Special Lagrangian Geometry 


We now focus on one class of calibrated submani- 
folds, special Lagrangian submanifolds in Calabi- 
Yau manifolds. Calabi-Yau 3-folds are used to 
make the spacetime vacuum in string theory, and 
special Lagrangian 3-folds are the classical versions 
of A-branes, or supersymmetric 3-cycles, in Calabi- 
Yau 3-folds. Special Lagrangian geometry aroused 
great interest amongst string theorists because of its 
róle in the SYZ conjecture, providing a geometric 
basis for *mirror symmetry" of Calabi-Yau 3-folds. 


Calabi-Yau Manifolds 


Here is our definition of Calabi-Yau manifold. 
Readers are warned that there are several different 
definitions of Calabi-Yau manifolds in use in the 
literature. Ours is unusual in regarding €) as part of 
the given structure. 
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Definition 3 Let m > 2. A Calabi-Yau m-fold is a 
quadruple (M, /, g, Q) such that (M,]) is a compact 
m-dimensional complex manifold, g a Kahler metric 
on (M, J) with Kahler form w, and 2 a holomorphic 
(71, 0)-form on M called the holomorphic volume 
form, which satisfies 


P" /ml = (—1)""-026/2)"0 AQ [1] 


The constant factor in [1] is chosen to make Re €? a 

calibration. It follows from [1] that g is Ricci-flat, Q 

is constant under the Levi-Civita connection, and 
the holonomy group of g has Hol(g) C SU(m). 


Let (M, J) be a compact, complex manifold, and g 
a Kahler metric on M, with Ricci curvature R,,. Define 
the Ricci form p of g by pac = J^ Ry... Then p is a closed 
real (1, 1)-form on M, with de Rham cohomology class 
[o] 2 2zc1(M) € H?(M, R), where c1(M) is the first 
Chern class of M in H?(M, Z). The Calabi conjecture 
specifies which closed (1,1)-forms can be the Ricci 
forms of a Kahler metric on M. 


The Calabi conjecture Let (M,]) be a compact, 
complex manifold, and g! a Kabler metric on M, 
with Kahler form w . Suppose that p is a real, closed 
(1, 1)-form on M with [p| - 2nc1(M). Then there 
exists a unique Kübler metric g on M with Kabler 
form w, such that |w|]— [w'] € H'(M,R),and the 
Ricci form of g is p. 


Note that [w] — [w'] says that g and g’ are in the 
same Kahler class. The conjecture was posed by Calabi 
in 1954, and was eventually proved by Yau in 1976. 
Its importance to us is that when the canonical bundle 
Ky is trivial, so that ci (M) = 0, we can take p = 0, and 
then g is Ricci-flat. Since Ky is trivial, it has a nonzero 
holomorphic section, a holomorphic (m, 0)-form €). As 
g is Ricci-flat, it follows that VO = 0, where V is the 
Levi-Civita connection of g. Rescaling Q by a complex 
constant makes [1] hold, and then (M,J,g,Q) is a 
Calabi-Yau m-fold. This proves: 


Theorem 4 Let (M,]) be a compact complex m- 
manifold with Ky trivial. Then every Kabler class 
on M contains a unique Ricci-flat Kahler metric g. 
There exists a holomorphic (m,0)-form Q, unique 
up to change of phase eO, such that 
(M, J, g, Q) is a Calabi-Yau m-fold. 


Using algebraic geometry, one can produce many 
examples of complex m-folds (M, J) satisfying these 
conditions, such as the Fermat (m + 2)-tic 


[Lzo, . .. 


e Cp":m2.....£22-0» [2] 


; Zm44l 


Therefore, Calabi-Yau m-folds are very abundant. 


Special Lagrangian Submanifolds 


Definition 5 Let (M, J, g, Q) bea Calabi-Yau m-fold. 
Then ReQ is a calibration on the Riemannian 
manifold (M,g). An oriented real m-dimensional 
submanifold N in M is called a special Lagrangian 
submanifold (SL m-fold) if it is calibrated with respect 
to Re €). 


Here is an alternative definition of SL m-folds. It 
is often more useful than Definition 5. 


Proposition 6 Let (M,J,g,Q) be a Calabi-Yau 
m-fold, with Kahler form w, and N a real m-dimen- 
sional submanifold in M. Then N admits an 
orientation making it into an SL m-fold in M if 
and only if w|y =0 and ImQ|\, = 0. 


Regard N as an immersed submanifold, with 
immersion 1: N — M. Then [w|y] and [ImQ|,)] are 
unchanged under continuous variations of the 
immersion ¿. Thus, [w|,;] 2 [Im O|,] — 0 is a neces- 
sary condition not just for N to be special 
Lagrangian, but also for any isotopic submanifold 
N' in M to be special Lagrangian. This proves: 


Corollary 7 Let (M,],g, Q) be a Calabi-Yau m- 
fold, and N a compact real m-submanifold in M. 
Then a necessary condition for N to be isotopic 
to a special Lagrangian submanifold N' in M 
is that |w|y] — 0 in H*(N, R) and [ImQ|,]=0 in 
H"' (N, R). 


Deformations of Compact SL m-Folds 


The deformation theory of compact special Lagran- 
gian manifolds was studied by McLean (1998), who 
proved the following result: 


Theorem 8 Let (M,],g, Q) be a Calabi-Yau 
m-fold, and N a compact special Lagrangian 
m-fold in M. Then the moduli space Mw of special 
Lagrangian deformations of N is a smootb manifold 
of dimension b'(N), tbe first Betti number of N. 


Sketch proof. Suppose for simplicity that N is an 
embedded submanifold. There is a natural orthogo- 
nal decomposition TM|,, = TN @v, where v — N is 
the normal bundle of N in M. As N is Lagrangian, 
the complex structure /: TM — TM gives an iso- 
morphism /:v — TN. But the metric g gives an 
isomorphism TN = T*N. Composing these two 
gives an isomorphism v = T*N. 

Let T be a small tubular neighborhood of N in M. 
Then we can identify T with a neighborhood of the 
zero section in v. Using the isomorphism v & T*N, we 
have an identification between T and a neighborhood of 
the zero section in T* N. This can be chosen to identify 
the Kahler form w on T with the natural symplectic 
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structure on T*N. Let *:T — N be the obvious 
projection. 

Under this identification, submanifolds N' in T C 
M which are C! close to N are identified with the 
graphs of small smooth sections o of T*N. That is, 
submanifolds N' of M close to N are identified with 
1-forms a on N. We need to know: which 1-forms a 
are identified with SL m-folds N'? 

Now, N' is special Lagrangian if w|,, = Im Q|,, = 0. 
But z|,;: N' — N is a diffeomorphism, so we can 
push w|,, and Im O|,; down to N, and regard them 
as functions of o. Calculation shows that 


T.(w|y)- de and s,(ImQO]|,;) = F(a, Va) 


where F is a nonlinear function of its arguments. 
Thus, the moduli space My is locally isomorphic to 
the set of small 1-forms a on N such that da = 0 
and F(a, Va) = 0. 

Now it turns out that F satisfies F(a, Va) = 
d(*a@) when a is small. Therefore, My is locally 
approximately isomorphic to the vector space of 1- 
forms a with da — d(«a) — 0. But by Hodge theory, 
this is isomorphic to the de Rham cohomology 
group H'(N, R), and is a manifold with dimension 
b! (N). 

To carry out this last step rigorously requires 
some technical machinery: one must work with 
certain Banach spaces of sections of T*N, A?^T*N 
and A"T*N, use elliptic regularity results to prove 
that the map a — (do, F(a, Va)) has closed image in 
these Banach spaces, and then use the implicit 
function theorem for Banach spaces to show that 
the kernel of the map is what is expected. 


Obstructions to Existence of Compact SL m-Folds 


Let ((M,],g;,€$):t € (—e,€)) be a smooth one- 
parameter family of Calabi-Yau m-folds. Suppose 
No is an SL m-fold in (M, Jo, go, Q0). When can we 
extend No to a smooth family of SL m-folds N, in 
(M, Jt, gi, €) for t € (—e, €)? 

By Corollary 7, a necessary condition is that 
[wln] = [Im Qn, ]=0 for all z. Our next result 
shows that locally, this is also a sufficient condition. 


Theorem 9 Let |((M,],,g,;):t€ (-e€) be a 
smootb one-parameter family of Calabi-Yau m-folds, 
with Kübler forms w,. Let No be a compact SL m-fold 
in (M,Jo,go,C)0), and suppose that [wn,l=0 
in H^(No, R) and [Im Q;|y, ] 2 0 in H"(No, R) for all 
t€(—e,e). Then No extends to a smooth one- 
parameter family {Ni:t € (—6,5)], where 0 < < e 
and N, is a compact SL m-fold in (M, J, g;, €);). 


This can be proved using similar techniques to 
Theorem 8. Note that the condition [Im OQ, ] — 0 


for all t can be satisfied by choosing the phases of 
the Q; appropriately, and if the image of H2(N, Z) in 
H35(M, R) is zero, then the condition [w|] — 0 holds 
automatically. 

Thus, the obstructions [w;|y,] = [Im Q;]y, ] 2 0 in 
Theorem 9 are actually fairly mild restrictions, and 
SL m-folds should be considered as pretty stable 
under small deformations of the  Calabi-Yau 
structure. 


Remark The deformation and obstruction theory 
of compact SL m-folds are extremely well behaved 
compared to many other moduli space problems in 
differential geometry. In other geometric problems 
(such as the deformations of complex structures on a 
complex manifold, or pseudoholomorphic curves in 
an almost-complex manifold, or instantons on a 
Riemannian 4-manifold), the deformation theory 
often has the following general structure. 


There are vector bundles E, F over a compact 
manifold M, and an elliptic operator P: C*(E) 一 
C™(F), usually first order. The kernel Ker P is the 
set of infinitesimal deformations, and the cokernel 
Coker P the set of obstructions. The actual moduli 
space M is locally the zeros of a nonlinear map 
V : Ker P — Coker P. 

In a generic case, Coker P=0, and then the 
moduli space M is locally isomorphic to Ker P, 
and so is locally a manifold with dimension ind(P). 
However, in nongeneric situations Coker P may be 
nonzero, and then the moduli space M may be 
nonsingular, or have an unexpected dimension. 

However, SL m-folds do not follow this pattern. 
Instead, the obstructions are topologically determined, 
and the moduli space is always smooth, with dimen- 
sion given by a topological formula. This should be 
regarded as a minor mathematical miracle. 


Mirror Symmetry and the SYZ Conjecture 


Mirror symmetry is a mysterious relationship 
between pairs of Calabi-Yau 3-folds M, M, arising 
from a branch of physics known as string theory, 
and leading to some very strange and exciting 
conjectures about Calabi-Yau 3-folds, many of 
which have been proved in special cases. 

In the beginning (the 1980s), mirror symmetry 
seemed mathematically completely mysterious. But 
there are now two complementary conjectural 
theories, due to Kontsevich and Strominger-Yau- 
Zaslow, which explain mirror symmetry in a fairly 
mathematical way. Probably both are true, at some 
level. The second proposal, due to Strominger, Yau, 
and Zaslow (1996), is known as the SYZ conjecture. 
Here is an attempt to state it. 
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The SYZ conjecture Suppose M and M are mirror 
Calabi-Yau 3-folds. Then (under some additional 
conditions), there should exist a compact topologi- 
cal 3-manifold B and surjective, continuous maps 
f: M 5 B and f: M — B, such that 


(i) There exists a dense open set Bo C B, such that 
for each b € Bo, the fibers f ' (b) and f'(b) are 
nonsingular special Lagrangian 3-tori T? in M 
and M. Furthermore, f ^ (b) and f-*(b 
some sense dual to one anotber. 

(ii) For each b € A=B\Bo, the fibers f'(b) and 
f (b) are expected to be singular special 
Lagrangian 3-folds in M and M. 


) are in 


The fibrations f and f are called special Lagran- 
gian fibrations, and the set of singular fibers A is 
called the discriminant. In part (i), the nonsingular 
fibers of f and f are supposed to be dual tori. What 
does this mean? 

On the topological level, we can define duality 
between two tori T, T to be a choice of isomorph- 
ism H'(T,Z)&Hq(T,Z). We can also define 
duality between tori equipped with flat Riemannian 
metrics. Write T=V/A, where V is a Euclidean 
vector space and A a lattice in V. Then the dual 
torus T' is defined to be V*/A*, where V* is the 
dual vector space and A* the dual lattice. However, 
there is no notion of duality between nonflat 
metrics on dual tori. 

Strominger, Yau, and Zaslow argue only that 
their conjecture holds when M, M are close to the 
"large complex structure ae In this case, the 
diameters of the fibers f^! (b), f -! (b) are expected to 
be small compared to the ‘ai of the base space 
B, and away from singularities of f, f, the metrics on 
the nonsingular fibers are expected to be approxi- 
mately flat. So, part (i) of the SYZ conjecture says 
that for b € B\Bo, f^! (b) is approximately a flat 
Riemannian 3-torus, and f -! (b) is approximately the 
dual flat Riemannian torus. 

Mathematical research on the SYZ conjecture has 
followed two broad approaches. The first could be 
described as symplectic topological. For this, we 
treat M, M just as symplectic manifolds and f, f just 
as Lagrangian fibrations. We also suppose B is a 
smooth 3-manifold and f,f are smooth maps. Under 
these simplifying assumptions, Mark Gross, Wei- 
Dong Ruan, and others have built up a beautiful, 
detailed picture of how dual SYZ fibrations work at 
the global topological level. 

The second approach could be described as local 
geometric. Here, we try to take the special Lagran- 
gian condition seriously from the outset, and focus 
on the local behavior of special Lagrangian 


submanifolds, and especially their singularities, 
rather than on global topological questions. In 
addition, we are intrested in what fibrations of 
generic Calabi-Yau 3-folds might look like. 

There is now a well-developed theory of SL 
m-folds with isolated singularities modeled on 
cones (Joyce 2003a). This is applied to SL 
fibrations and the SYZ conjecture in Joyce 
(2003a, b), leading to the tentative conclusions 
that for generic Calabi-Yau 3-folds M, special 
Lagrangian fibrations f : M — B will be only piece- 
wise smooth, and have discriminants A of real 
codimension 1 in B, in contrast to smooth fibra- 
tions which have A of codimension 2. We also 
argue that for generic mirrors M,M and f,f, 
the discriminants A,A cannot be homeomorphic 
and so do not coincide. This contradicts part (ii) 
above. 

A better way to formulate the SYZ conjecture 

may be in terms of families of mirror Calabi-Yau 
3-folds M,, M, and fibrations f,: M, — B, f: M, 
B for t € (0,c) which approach the "large complex 
structure limit" as 一 0. Then we could require the 
discriminants A,, A, of fs f. to converge to some 
common, codimension 2 limit Ap as 一 0. 

It is an important, and difficult, open problem to 
construct examples of special Lagrangian fibrations 
of compact, holonomy SU(3) Calabi-Yau 3-folds. 
None are currently known. 


See also: Minimal submanifolds; Mirror Symmetry: 
A Geometric Survey; Moduli Spaces: An Introduction; 
Riemannian Holonomy Groups and Exceptional Holonomy. 
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Introduction 


Systems of Calogero-Moser-Sutherland (CMS) type 
form a class of finite-dimensional dynamical systems 
that are integrable both at the classical and at the 
quantum level. The CMS systems describe N point 
particles moving on a line or on a ring, interacting 
via pair potentials that are specific functions of four 
types, namely rational (I), hyperbolic (II), trigono- 
metric (III), and elliptic (IV). They occur not only in 
a nonrelativistic (Galilei-invariant), but also in a 
relativistic (Poincaré-invariant) setting. Thus, one 
can distinguish a hierarchy of 16 physically distinct 
versions (classical/quantum, nonrelativistic/relativis- 
tic, type I-IV), the most general one being the 
quantum relativistic type IV system. 

The nonrelativistic systems date back to pioneer- 
ing work by Calogero, Sutherland, and Moser in the 
early 1970s. The pair potential structure of the 
interaction can be encoded in the root system Ay. 1, 
and there also exist integrable versions for all of the 
remaining root systems. The classical systems are 
given by N Poisson commuting Hamiltonians with a 
polynomial dependence on the particle momenta 
Di,...,pN. Accordingly, the quantum versions are 
described by N commuting Hamiltonians that are 
partial differential operators. 

The relativistic systems were introduced in the 
mid-1980s, at the classical level by Ruijsenaars and 
Schneider, and at the quantum level by Ruijsenaars. 
They converge to the nonrelativistic systems in the 
limit c — oo, where c is the speed of light. Again, the 
systems can be related to the root system Ay. ;, and 
they admit integrable versions for other root 
systems. All of the commuting classical. Hamilto- 
nians depend exponentially on generalized momenta 
p1,..., pu. Hence, the associated commuting quan- 
tum Hamiltonians are analytic difference operators. 

The above integrable systems can be further 
generalized by allowing supersymmetry or internal 
degrees of freedom (“spins”), coupled in quite 
special ways to retain integrability. In this article, 
however, the focus is on the 16 versions of the 
An_1-Symmetric CMS systems without internal 
degrees of freedom. The primary aim is to acquaint 
the reader with their definition and integrability, 
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and with their most prominent features and inter- 
relationships. Second, we intend to give a rough 
sketch of the state of the art concerning explicit 
solutions for the various versions. This involves a 
concretization of the action-angle maps and eigen- 
function transforms that simultaneously diagonalize 
the commuting dynamics, paying special attention to 
theit remarkable duality properties. 

It is beyond the scope of this article to review the 
hundreds of papers specifically dealing with CMS 
type systems, let alone the much larger literature 
where they play some role. Indeed, the systems have 
been encountered in a great many different contexts 
and they are related to a host of other integrable 
systems in various ways. Accordingly, they can be 
studied from the perspective of various subfields of 
mathematics and theoretical physics. First some of 
these perspectives and relations to seemingly quite 
different topics will be mentioned before embarking 
on the far more focused survey. 

Staying first within the confines of the CMS type 
systems, some nonobvious limits yielding other 
familiar finite-dimensional integrable systems will 
be mentioned. To begin with, all of the Ayn-ı type 
systems give rise to systems with a Toda type 
(exponential *nearest neighbor") interaction via a 
suitable limiting transition (basically a strong- 
coupling limit). This leads to integrable N-particle 
systems with a classical/quantum, nonrelativistic/ 
relativistic, nonperiodic/periodic version; starting 
from the quantum relativistic periodic Toda system, 
the remaining seven versions can be obtained by 
suitable limits. 

Next, we recall that the quantum system of N 
nonrelativistic bosons on the line or ring interacting 
via a pair potential of ó-function type is soluble via a 
Bethe ansatz, with the “line version" exhibiting 
quantum soliton behavior (factorized scattering). It 
has been shown that there exist scaling limits of 
eigenfunctions for suitable CMS systems that give 
rise to the latter Bethe type eigenfunctions for N = 2, 
while convergence for N > is plausible, but has 
not been demonstrated thus far. 

Via suitable analytic continuations preserving 
reality/formal self-adjointness, one can arrive at 
CMS systems with more than one species of particle 
(particles and "antiparticles"). Likewise, analytic 
continuations and appropriate limits of CMS sys- 
tems associated with root sytems other than Aw. 
lead to a further proliferation of N-dimensional 
integrable systems. Typically, such limits refer either 
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to the commuting Hamiltonians (the Toda limit 
being a case in point) or to the joint eigenfunctions 
(as exemplified by the ó-function system limit); it 
seems difficult to control both sets of quantities at 
once. 

Starting from the spin type CMS systems, another 
kind of limit can be taken. Specifically, by “freez- 
ing" the particles at equilibrium positions, it is 
possible to arrive at integrable spin chains of 
Haldane-Shastry and Inozemtsev type. 

At this point, it is expedient to insert a brief 
remark on finite-dimensional integrable systems. As 
the term suggests, one may expect that, with due 
effort, such systems can be “integrated,” or, equiva- 
lently, “solved.” But it should be noted that the 
latter terms (let alone the qualifier *due effort") 
have no unambiguous mathematical meaning. Cer- 
tainly, *solving" involves obtaining explicit infor- 
mation on the action-angle map and joint 
eigenfunction transform at the classical and quan- 
tum level, resp., but a priori it is not at all clear how 
far one can proceed. 

Focusing again on the CMS systems and their 
relatives, it should be stressed that, in many cases, 
one is still far removed from a complete solution, 
especially for the elliptic CMS systems. In this 
regard the previous remark serves not only as a 
caveat, but also to make clear why the various 
vantage points provided by different subfields in 
mathematics and physics are crucial: typically, they 
yield complementary insights and distinct represen- 
tations for solutions, serving different purposes. 

To be sure, in first approximation the mathe- 
matics involved at the classical and quantum level is 
symplectic geometry and Hilbert space theory, resp. 
In point of fact, however, far more ingredients have 
turned out to be quite natural and useful. On the 
classical level, these include the theory of groups, Lie 
algebras and symmetric spaces, linear algebra and 
spectral theory, Riemann surface theory, and more 
generally algebraic geometry. 

On the quantum level, the viewpoint of harmonic 
analysis on symmetric spaces is particularly natural 
and fruitful for the nonrelativistic CMS systems and 
their arbitrary root-system versions, whereas quan- 
tum groups/algebras/symmetric spaces can be tied in 
with the relativistic systems and their versions for 
other root systems. (The c — oo limit amounts to the 
q— 1 limit in the quantum group picture.) As a 
matter of fact, the whole area of special functions 
and their q-analogs is intimately related to the 
quantum CMS type systems (cf. also the last section 
of this article). Finally, the occurrence of commut- 
ing analytic difference operators in the relativistic 
(q X 1) systems leads to largely uncharted territory 


in the intersection of the theory of Hilbert space 
eigenfunction expansions and the theory of linear 
analytic difference equations. 

The study of the thermodynamics (N — oo limit 
with temperature >0 and density >0 fixed) asso- 
ciated with the trigonometric and elliptic CMS 
systems and their spin cousins yields its own circle 
of problems. It was initiated by Sutherland three 
decades ago, and even though a host of results on 
partition functions, correlation functions, fractional 
statistics, strong-weak coupling duality, relations to 
Yangians, etc., have meanwhile been obtained, 
many questions are still open. This area also has 
links with random-matrix theory, but the input from 
this field is thus far limited to certain discrete 
couplings. 

The above N-dimensional integrable systems are 
related to a great many infinite-dimensional integr- 
able systems, both at the classical and at the 
quantum level. On the one hand, there are structural 
analogs that have been used to advantage in the 
study of CMS systems, including Lax pair and R- 
matrix formulations, zero-curvature representations, 
bi-Hamiltonian formalism, Backlund transforma- 
tions, time discretizations, and tools such as Baker- 
Akhiezer functions, Bethe ansatz, separation of 
variables, and Baxter-type O-operators. 

On the other hand, there are striking physical 
similarities between various soliton field theories 
(a prominent one being the sine-Gordon field 
theory) and infinite soliton lattices (in particular 
several Toda type lattices), and the CMS systems for 
special parameter values. Particularly conspicuous 
are the ties between the classical CMS systems and 
the KP and two-dimensional Toda hierarchies. The 
latter relations actually extend beyond the solitons, 
including rational and theta function solutions. 

CMS systems are relevant in various other 
contexts not yet mentioned. A prominent one 
among these is a class of supersymmetric gauge 
field theories. In this quantum context, the classical 
CMS systems have surfaced in the description 
of moduli spaces encoding the vacuum structure 
(Seiberg-Witten theory). Equally surprising, certain 
classical CMS systems (with internal degrees 
of freedom) have found a second application in a 
quantum context, namely in the description of 
quantum chaos (level repulsion). 

We conclude this introduction by listing addi- 
tional disparate subjects where connections with 
CMS type systems have been found. These include 
the theory of Sklyanin, affine Hecke, Kac- Moody, 
Virasoro and W-algebras, equations of Knizhnik- 
Zamolodchikov, Yang-Baxter, Witten-Dijkgraaf- 
Verlinde-Verlinde, and Painlevé type, Gaudin, 
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Hitchin, Wess-Zumino, matrix and quasi-exactly 
solvable models, Dunkl-Cherednik and Polychrona- 
kos operators, the quantum Hall effect and quantum 
transport, two-dimensional Yang-Mills theory, 
functional equations, integrable mappings, Huygens’ 
principle, and the bispectral problem. 


Classical Nonrelativistic CMS Systems 


A system of N nonrelativistic equal-mass m particles 
on the line interacting via pair potentials can be 
described by a Hamiltonian 


i | 
H=>—) m+ 2 V6u-sm) 
j=1 


1<j<k<N 
The CMS systems are defined by four distinct 
choices of pair potential. The simplest choice reads 
Vix) =g /me, g>0 (I) 2] 


Hence, the coupling constant g has dimension 
[action] (the product of [position] and [momen- 
tum]). This potential is clearly repulsive. Thus, each 
initial state in the phase space 


Q= {(x,p) € R™ |x e G} [3] 


where G is the configuration space 


m0 [1| 


G = (x e RN |xN < =: < xi] [4] 


is a scattering state. 
The next level is given by the hyperbolic choice 


V(x) = gh2/msinh?(vx), v>0 (ID [5] 


Hence, v has dimension [position] !, and the 
previous system arises by taking v to 0. It is clear 
that [5] yields again a repulsive particle system, so 
that each state in 2 given by [3] is a scattering state. 

The highest level in the hierarchy is the elliptic 
level, where 


V(x) = g'p(x;w,u)/m, w,—iw »0' (IV) [6] 


and (x;w,w’) denotes the Weierstrass -function 
with periods 2w and 2w. It is beyond the scope of 
this article to elaborate on the elliptic regime, even 
though it is of considerable interest. It reappears in 
later sections as the most general regime in which 
integrability holds true. Indeed, a prominent feature 
of the elliptic case [6] is that it can be specialized 
both to the hyperbolic case [5] and to the trigono- 
metric case, given by 


V(x) = g^) /msin?(vx) (III) [7] 


To obtain the hyperbolic specialization, one 
should take w — iz/2v and send w to oo; then [6] 


reduces to [5] (up to an additive constant). Likewise, 
[7] results from [6] by choosing w=7/2v and 
taking —iu to oc. 

The physical picture associated with the trigono- 
metric and elliptic systems is quite different from 
that of the rational and hyperbolic ones. Of course, 
the potentials [7] and [6] are again repulsive, but 
now the internal motion is confined and oscillatory. 
More specifically, due to energy conservation the 
phase spaces 


Qn = Gu x RN, 
Gg = {XN <* < xi,xi — XN < T/V} [8] 


Qiy = Giv X RY, 
Gy = {xn < ++- < x1,%1 — XN < 2w} [9] 


are left invariant by the flow generated by the 
trigonometric and elliptic N-particle Hamiltonian, resp. 

Alternatively, one may interpret the trigonometric 
Hamiltonian as describing particles constrained to 
move on a circle and interacting via the inverse 
square potential [2]. In this picture, the quantities 
2vxi,...,2vxw are viewed as angular positions on 
the circle, and one needs a suitable quotient of the 
phase space [8] by a discrete group action to 
describe a state of the system. 

Turning to integrability aspects, we begin by 
noting that the total momentum Hamiltonian 


N 
P= > bi [10] 


obviously Poisson commutes with the above defin- 
ing Hamiltonians of the systems. For N =2, there- 
fore, integrability is plain. It is possible to write 
down explicitly the higher commuting Hamiltonians 
for N > 2 as well but, in the nonrelativistic setting, 
it is more illuminating to characterize them as the 
power traces or (equivalently) the symmetric func- 
tions of a so-called Lax matrix. 

The Lax matrix is an NxN matrix-valued 
function on the phase space of the system. It plays 
a pivotal role not only for understanding integr- 
ability, but also for setting up an action-angle 
transformation. The latter issue is discussed again 
later. Here the more conspicuous features of the Lax 
matrix will be explained, focusing on the type II 
system for expository ease. Then one can choose 


Ls = Pj, Liz = igyv/sinh V(x; =F Xz), 
jk=1,...,N,j#k [11] 
Thus, L is Hermitean and we have 


tL=P; trl*=2mH [12] 
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(The rational Lax matrix results from [11] by taking 
v — 0, and the trigonometric one by taking v — iv. 
The elliptic Lax matrix has a similar structure, but it 
involves an extra “spectral” parameter.) 

Although not obvious, it is true that all of the 
power traces 


Hy = gtr Lt k=1,...,N [13] 
are in involution (i.e., Poisson commute). One way to 
understand this involves the so-called Lax pair 
equation associated with the Hamiltonian flow gener- 
ated by H — H5/m. This involves a second N x N 
matrix function given by 


一 洛克 
M; = 》 一 一 一 
i i m sinh? V(x; — xı) 
igv? cosh v(x; — xp) [14] 
Miz ahs Tor m 
m Sinh v(x; — Xz) 
jzk 
When the positions and momenta in L and M evolve 
according to the H-flow, one has 


Li = [Ms Ly] [15] 


where [ -,-] is the matrix commutator. (Indeed, [15] 
amounts to the Hamilton equations, as is readily 
checked.) Since M is anti-Hermitean, it is not 
difficult to derive from this Lax pair equation that 
the flow is isospectral: L; is related to Lo by a 
unitary transformation L; — U;Lo9U; obtained from 
Mi, so that the spectrum of L; is time independent. 
This argument already shows the existence of N 
conserved quantities under the H-flow, namely the 
N eigenvalues of L. It is, however, simpler to work 
with either the power traces Hi given by [13] or 
with the symmetric functions $ of L, given by 


22 XS), [16] 


These Hamiltonians depend only on the eigenvalues 
of L, so they are also conserved under the flow. 
Note that 


det( Ín 十 AL) 


$(—P;  Si—P*-mH [17] 


To see why these Hamiltonians are in involution, 
one can invoke the long-time asymptotics of the 
H-flow. It reads 


pt) ~,  pn<- <fi, 
xj(t) x? + tb;/m, [18] 
j= sl ES 60 


Accordingly, one gets 


L; ~ diag(pi,...,pn) = L t— oo [19] 
Since the time evolution is a canonical transforma- 
tion and the Poisson brackets [H,, Hj] are time 
independent (by the Jacobi identity), it now readily 
follows from [19] that they vanish. (Indeed, Hi and 
H; reduce to power traces of Læ, and the asymptotic 
momenta f,,...,py Poisson commute.) 


Quantum Nonrelativistic CMS Systems 


The canonical quantization prescription 
p; —ibOjOx;, j=1,...,N [20] 


(b being the Planck constant) gives rise to an 
unambiguous quantum Hamiltonian 


Ws a >, V(x; — Xz) [21] 


j-1 1<j<k<N 


for any classical Hamiltonian [1]. Thus, the defin- 
ing Hamiltonians of the above systems give rise to 
well-defined partial differential operators (PDOs), 
which act on suitable dense subspaces of the 
Hilbert space L?7(G,,,dx),K=I,...,1V, with Gi and 
Gy given by G in [4], and Gm, Gry by [8] and [9], 
respectively. 

We recall that there is no general result ensuring that 
a classically integrable system admits an integrable 
quantum version. More precisely, when one substi- 
tutes [20] in N Poisson commuting Hamiltonians, it 
need not be true that they commute as quantum 
operators, even when no ordering ambiguities are 
present. For the power trace Hamiltonians such 
ambiguities do occur. (For example, [11] gives rise 
to a term in H3 proportional to pı /sinh? v(x1 — X2).) 
On the other hand, no noncommuting factors occur 
in the quantization of $1,..., SN. To verify this, one 
need only note that $, equals the sum of all k x k 
principal minors of L, cf. [16]; choosing a diagonal 
element p; in a summand, one therefore has no 
dependence on x; in the remaining factors, hence no 
ordering ambiguity. 

As a result, the prescription [20] yields N 
unambiguous operators S(x, —ib V), which are 
moreover formally self-adjoint on L7(G,, dx) for 
each of the four cases & —L,...,IV. Although by no 
means obvious, it is true that these operators do 
commute. Thus, integrability is preserved under 
quantization of the above systems. Now the power 
traces of a matrix can be expressed as polynomials 
in the symmetric functions (via the Newton 


i LO yi 
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identities), so this yields an ordering ensuring that 
the quantized power traces commute as well. 

Just as the action-angle transformation for a 
classically integrable system “diagonalizes” all of 
the Poisson commuting Hamiltonians at once (in the 
sense that the transformed Hamiltonians depend 
only on the action variables), one expects that there 
exists a unitary operator that transforms all of the 
commuting Hamiltonians to diagonal form. In the 
classical setting, the existence of this diagonalizing 
map follows (under suitable technical restrictions) 
from the Liouville-Arnold theorem, whereas in the 
quantum context the existence of such a joint 
eigenfunction transformation is a far more delicate 
issue. This problem is briefly discussed later again, 
noting here that the solutions obtained to date vary 
considerably in completeness and “explicitness” for 
the four regimes. 


Classical Relativistic CMS Systems 


The nonrelativistic spacetime symmetry group is the 
Galilei group. Its Lie algebra is represented by the 
time translation generator H given by [1], space 
translation generator P given by [10], and the Galilei 
boost generator 


B=-m) xj [22] 


More precisely, the Poisson brackets are given by 


(H,P) 2-0, {H,B}=P, {P,B}= Nm [23] 
so that the last bracket does not vanish (as is 
the case for the Galilei Lie algebra). This deviation 
is inconsequential, however, since the constant 
Nm (central extension) yields trivial Hamilton 
equations. 

The relativistic spacetime symmetry group (Poin- 
caré group) yields a Lie algebra that differs from 
[23] only in Nm being replaced by H/c*, where c is 
the speed of light. Clearly, the functions 


N 
H — mc 3 cosh (25 
N 
P= me 2. sinh (2) 


together with B given by [22] give rise to these 
altered Poisson brackets. Physically, these three 
generators describe a system of N relativistic free 
mass-m particles in terms of their rapidities p;/mc. 


[24] 


A natural ansatz to take interaction into account 
now reads 


N 
* Pi y. 
H —mc' » cosh (2L) V;(x) 


N 
三 nhl LEY 25 
P= m! Pas (2^) Vi(x) [25] 
Vj(x) = | [f(x; — xe) 
kj 


Indeed, it is plain that this still entails 


(H,B) — P, (P,B) =H/e [26] 


But to obtain a relativistic particle system, the time 
and space translations must also commute. The 
corresponding requirement (H, P) — 0 yields a severe 
constraint on the *pair potential" function f(x) in 
[25] whenever N>2. (For N=2, one gets 
(H, P] — 0 irrespective of the choice of f.) 

As it turns out, the vanishing requirement is 
satisfied when 


f^(x) =a + bp(x) [27] 


where a, b are constants and g(x) is the Weierstrass 
function already encountered. Taking, for example, 
a,b > 0, one can take the positive square root of the 
right-hand side of [27]. This choice of f(x) yields the 
defining Hamiltonian of the relativistic elliptic 
system (type IV). In the three degenerate cases, it is 
convenient to choose 


(1g? /m? x2)? (I) 
(1--sin?(vg/mc)/sinh^(vx))"? (IW) [28] 
(1+sinh*(vg/mc)/sin*(vx))'/* (II) 


f(x) = 


It is an elementary exercise to check that this 
implies 


lim (H — Nc’) = Hy. 


c—00 


hm P= P. [29] 


where H,, and Par are the above nonrelativistic time 
and space translation generators. Hence, the defin- 
ing Hamiltonians of the relativistic systems reduce 
to their nonrelativistic counterparts in the limit 
C — oo. 

The special character of the function [27] makes 
itself felt not only in ensuring Poincaré invariance, 
but also in entailing integrability. To begin with, 
note that the functions 


N 


Sin = exp ( 8n). B= lime [30] 
—1 


J= 
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commute with H and P, so that integrability for 
N —3 is plain. More generally, the Hamiltonians 


S47 Y exp (+85 p) Efe - a0) 


IC(1.....N] cl E 
€ j [31 


上 二 uM 
can be shown to mutually commute. Clearly, one has 
SS wÜw.n $= L..N-—I [32] 
and 


H = (Sı + S-1)/2mp*, P= (81 


—$.1)/28 [33] 


As anticipated by the notation, the functions 
$1,..., $u may be viewed as the symmetric functions 
of a Lax matrix. More precisely, in the elliptic case 
this is true up to multiplicative constants that 
depend on a spectral parameter occurring in the 
Lax matrix. As before, only the Lax matrix for the 
type II system is specified here. In this case, one can 
dispense with the spectral parameter and choose 


Liz = Ei Cilk, j,k = Leana N [34] 
where 
e; = exp(vx; + Bp;/ 2) Whe — xj) ^ [35] 
Iz 
sinh(iGvg) 


Cj, = exp(—v(xj + x,)) [36] 


sinh v(x; — x, + iBg) 


In [35], f(x) is the type II function given by [28]. The 
matrix C arises from Cauchy's matrix 1/(w; — Zk) 
via a suitable substitution, and Cauchy's identity 


N 
det 到 =) 
Zk/ jk=1 


Wj 
II 
kel p | Tru 


(wj — wk) (Zi — Zk) 
Wi — zy)(zj 一 Wp) 


[37] 


ensures that [34] yields the Hamiltonians $; of [31]. 
To conclude this section, we point out that the 
relation 


b = Ín E Bia + O(8?), 


where La denotes the nonrelativistic Lax matrix 
[11], can be used to deduce the involutivity of the 
nonrelativistic Hamiltonians from that of their 
relativistic counterparts. 


B30 [38] 


Quantum Relativistic CMS Systems 


When the canonical quantization prescription [20] is 
applied to the classical Hamiltonians [31] with 


f(x) — 1, one obtains commuting quantum operators 
whose action is exemplified by 


(i)o) pm 


That is, the operators act on functions that have an 
analytic continuation in x1,...,xw from the real line 
R to a strip around R in the complex plane C, 
whose width is at least 2h/mc. 

Operators of this type are called analytic differ- 
ence operators (henceforth AAOs). The choice 
f(x)=1 amounts to the free case g=0 in [28]. 
For g#0, however, the canonical quantization 
exemplified by [39] yields noncommuting AAOs. 
Thus, the factor ordering following from [31] 
would entail that integrability breaks down at the 
quantum level. 

As mentioned before, there is no general result 
guaranteeing that a different ordering that preserves 
integrability exists. Even so, this is true in the 
present case. Specifically, the function f(x) can be 
factorized as f, (x)f (x), and then the AAOs 


$4 一 > II fein; 


Ic{1,...N} jel 
(j=l kgl 


6 enn (=a) ITA; 


jel es 


- Xp) 
= Xp) [40] 


do commute. In the elliptic case [27], this factoriza- 
tion involves the Weierstrass o-function, and com- 
mutativity can be encoded in a sequence of 
functional equations satisfied by the o-function. 
For the type I-III systems the pertinent factorization 
of [28] is given by 


(1 + ifg/x)'/ I) 


( 
(sinh v(x + i8g)/sinh vx)'/? (I) [41] 
(sin v(x + 18g) /sin vx)? ^. (III) 


f(x) = 


(Here one has g > 0, and the choice of square root is 
such that f(x) — 1 for g | 0.) 

The nonrelativistic limit c — oo of the quantum 
Hamiltonians [33] can be determined by expanding 
Sı and S4 in a power series in B= 1/mc. In this 
way, one obtains once more [29], except for a small, 
but crucial change in H,,: instead of the coupling 
constant dependence g? in the potential energy, one 
gets g(g — b). The extra term arises from the action 
of the term linear in 8 in the expansion of the 
exponential on the term linear in 8 in the expansion 
of the functions f. (x) 

From the perspective of the nonrelativistic quan- 
tum CMS systems, the change g^ — g(g — 5) appears 
ad. hoc. As it transpires, however, the different 
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dependence on g ensures that the eigenfunctions of 
H, depend on g in a far simpler way. This will 
become clear shortly. 


Action-Angle Transforms and Duality 


Under certain technical assumptions, any integrable 
system given by N independent Poisson commu- 
ting Hamiltonians $;(x,p),...,Sn(x,p) on a 2N- 
dimensional phase space admits local canonical 
transformations to action-angle variables. Like the 
spectral theorem on the quantum level, this 
structural result is of limited practical value. Indeed, 
just as the spectral theorem yields no concrete 
information concerning eigenfunctions, bound-state 
energies, scattering, etc., associated with a given 
self-adjoint Hamiltonian, the  Liouville-Arnold 
theorem only yields general insight in the type of 
motion that can occur and the geometric character 
of the local maps (in terms of invariant tori). 

To fully comprehend (“solve”) a given integrable 
system, one should render the associated action- 
angle map as concrete as possible. For the CMS type 
systems, a complete solution to this problem has 
only been achieved for the systems of type I-III. The 
motion in the trigonometric systems is oscillatory, so 
that a closeup via the action-angle transform 
involves extensive geometric constructions. By con- 
trast, the type I and II systems are scattering systems, 
and here the action-angle map can be tied in with 
the classical wave maps (Moller transformations). 

We now sketch some salient features of the 
action-angle maps for systems of type I and II. In 
all cases the map (denoted 4) is a canonical 
transformation from the phase space Q (eqn [3]) 
with 2-form dx ^ dp to the PE space 


Q = {(%,p)-€ [42] 


with 2-form dX A dp. Thus, the actions p,,...,Py 
vary over G given by [4] and the “angles” €X4,..., XN 
over R. Consequently, € amounts to 2 with x and p 
interchanged. 

As should be the case, the transformed commuting 
Hamiltonians 


$, = Sp- 1, ede IAE. [43] 


depend only on the action vector f. To be specific, 
they arise from S,(x,p) by taking g=0 (no interac- 
tion, hence no x dependence) and substituting p — f. 
Indeed, the actions p, are the t — oo limits of the 
momenta p(t), where the t dependence refers to the 
defining Hamiltonian of the system. 

As it happens, the Lax matrix L is of decisive 
importance to concretize the action-angle map €, 


and in particular to reveal its hidden duality 
properties. The starting point is a commutation 
relation of L(x,p) with a diagonal matrix A(x) 
given by 


A(x) = diag(d(x1),..., d(xN)) 


f» E [44] 
49) = au (II) 


Obviously, the symmetric functions D,(x) of A(x) 
yield an integrable system on Q, so the Hamiltonians 


D,(&,p) = (Dp o ®')(%,p), k=1,...,N [45] 


yield an integrable system on the action-angle phase 
space Q. The crux of the matter is now that these 
systems are familiar: they are also systems of type I 
and II! 

To be specific, let us denote the dual systems just 
described by a caret, and the nonrelativistic/relati- 
vistic systems by a suffix nr/rel, resp. Then the 
duality properties alluded to are given by 


La — | 
ret = IL 


lar = lor 》 


46 
IL. ~ Leads | | 


and ^ serves as the action-angle map for the dual 
systems. 

In order to sketch why this state of affairs holds 
true for the IL, system, recall that its Lax matrix is 
given by [34]. From this, one readily checks the 
commutation relation 


coth(i8vg)|A, L] = 2e & e — (AL + LA) [47] 


Since L is Hermitean, there exists a unitary U 
diagonalizing L. It can now be shown that the 
spectrum of L is positive and nondegenerate, and 
that U*e has nonzero components. The gauge 
ambiguity in U (given by a permutation matrix and 
diagonal phase matrix) can, therefore, be fixed by 
requiring 


U*LU = diag(exp(p1),...,exp(Spn)), 
PN < … < Êi [48] 
(U*e) o 0, f= bN [49] 


A suitable reparametrization of U*e then yields the 
“angle” vector £. 

As a consequence, U*AU becomes a function of £ 
and p. In detail, one finds 


(U'AU)&,P)- L(3/2,2v;p,%)" [50] 


where L(v, 8; x, p) is given by [34] and T denotes the 
transpose. Therefore, the “dual Lax matrix” 
A=U*AU is essentially equal to L, explaining the 
self-duality TI ~ IL; announced above. 
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With the action-angle transform under explicit 
control, much more can be said about the solutions 
to Hamilton's equations for each of the commuting 
Hamiltonians, both as regards finite times and as 
regards long-time asymptotics (scattering). It is 
beyond the scope of this article to enlarge on this, 
but it is worth mentioning that the scattering reveals 
the solitonic character of the particles. Indeed, the 
set of asymptotic momenta f,,...,fy is conserved 
under the scattering and the asymptotic position 
shifts are factorized in terms of pair shifts. A quite 
remarkable feature of the type I systems is that the 
shifts actually vanish (“billiard ball" scattering). 


Eigenfunction Transforms and Duality 


Both at the relativistic and at the nonrelativistic level 
the commuting quantum Hamiltonians $1,...,SN 
are formally self-adjoint on the Hilbert space 
L^(G,, dx), &—-L...,IV. Thus, it may be expected 
that it is possible to construct a unitary eigenfunc- 
tion transform 


5, : L'(G,, dx) > L(g, du, (p)), 
ln [51] 


diagonalizing S, as multiplication by a real-valued 
function M;(p). Here Gy encodes the joint spectrum 
and di. (p) is a suitable measure on G,. 

Obviously, this expectation is borne out in the 
free case g—0. Then, 4, is basically Fourier 
transformation, its kernel consisting of a sum of 
joint eigenfunctions 


exp(—ix - e(p)/b), 
with o ranging over the permutation group SN. For 
& —LIL, one can take G, — G, — G (eqn [4]) and 
du (p) — dp. Here one gets 


a € SN [52] 


Pi, s 935 Pi, 
T - 53 
(D) ta x | exp(Bp; ) T - exp(pi, ) | | 


in the nonrelativistic and relativistic case, resp. For 
K= I, IV, one needs to take into account periodic 
boundary conditions on the walls of G,, yielding a 
discrete joint spectrum after the center-of-mass 
motion is omitted. (With the above choices of Gry 
and Gry, cf. [8] and [9], the center-of-mass motion is 
a free motion along the line, so the total momentum 
still varies continuously.) Of course, the diagona- 
lized $; are once more given by [53], since the kernel 
of o, consists of free boson states. 

Taking next g > 0, the above expectation has not 
been confirmed for all of the eight regimes involved. 
This is not only because in some cases not even the 


existence of joint eigenfunctions has been shown, 
but also because in the relativistic case the unitarity 
of Py and Brv already breaks down for N —2 when 
g increases beyond a critical value, cf. [57] below. It 
is quite likely that this happens for N > 2 as well, 
but this is not readily apparent from the current 
fragmentary knowledge on joint eigenfunctions for 
N » 2. 

The only two cases where the g >0 joint 
eigenfunction transform is of an elementary nature 
are the Miar and IIl; cases. Indeed, the joint 
eigenfunctions describing the internal motion are of 
the form 


Un(x) = W(x)"P,(x), nenn! [54 


Here, 


w(x; — xz) [55] 


1<j<k<N 


is a positive weight function on Grm and the P,,(x) 
are multivariable orthogonal polynomials. Thus, 
P,(x) is a finite linear combination of the above 
free boson states, with p in [52] a linear function of 
n. For the Ml, case, these eigenfunctions were 
already found by Sutherland. (Here, the functions 
P,(x) amount to polynomials, often called the Jack 
polynomials, which arose in a statistics context.) 
The II polynomials may be viewed as the special 
Ayn-ı case of Macdonald's orthogonal q-polyno- 
mials for arbitrary root systems, with 


q = exp( -2bv) [56] 


(Note that g converges to 1 both in the nonrelati- 
vistic limit c — oo and in the classical limit 5 — 0.) 

For the Ilar case, the joint eigenfunctions were 
found and studied a couple of decades ago by 
Heckman and Opdam, yielding a multivariable 
hypergeometric transform. Indeed, for N —2, the 
eigenfunctions can be expressed in terms of the 
hypergeometric function 5F;, as has been known 
since the early days of quantum mechanics. Like- 
wise, the arbitrary-N I,, joint eigenfunction trans- 
form (studied in detail by de Jeu) can be viewed as a 
multivariable Hankel transform, the N —2 kernel 
being essentially a Hankel function. 

Much less is known concerning IV,, eigenfunc- 
tions, and a fortiori for the associated transform 
Py. For N=2 the time-independent Schrödinger 
equation amounts to the Lamé equation. Hence, 
solutions are Lamé functions that can be studied in 
particular via Fuchs theory (regular singularities). A 
far more explicit form of the eigenfunctions dates 
back to work by Hermite in the nineteenth century. 
More precisely, provided the g dependence of the 
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defining Hamiltonian is changed from g to g(g — 5) 
(a change already encountered above), Hermite's 
results apply to couplings g=/b, 1—2,3,4,... His 
eigenfunctions have a structure that is nowadays 
referred to as the Bethe ansatz. For the same g values 
and arbitrary N, Hnr eigenfunctions of Bethe ansatz 
type were found and studied by Felder and 
Varchenko, but even for these g values much 
remains to be done to achieve a complete under- 
standing of the ®jy transform. 

A quite different approach, due to Komori and 
Takemura, does yield rather detailed information on 
y for arbitrary g > 0. The key feature of their 
strategy is to view the IV,, case as a perturbation of 
the III, case. This entails, however, that the validity 
of their results is restricted to large imaginary period 
of the o-function. 

For the IV,4 system, there are only rather 
complete results on ®ry for N — 2. More specifically, 
the eigenfunction transform is known to be unitary 
for 


g € [0,5 + /Bv| [57] 


and a dense set in a corresponding parameter space. 
(For g outside this interval, unitarity is violated.) 
The kernel of @r involves eigenfunctions of Bethe 
ansatz structure. For g — lb, | — 2, 3,... and arbitrary 
N, Bethe ansatz type H's eigenfunctions were found 
by Billey, generalizing the Felder-Varchenko results 
mentioned above. 

It remains to discuss the I,e and He systems. To 
this end, we first recall the classical dualities [46]. It 
is natural to expect that these dualities are still 
present at the quantum level. For the Inr case, this is 
readily confirmed: the transform is indeed invariant 
under interchange of x and p. In fact, the N—2 
center-of-mass Hankel transform even depends only 
on (x1 — x3)(p1 — p2), so that self-duality is manifest 
in this case. 

More generally, for N —2 the expected dualities 
[46] are indeed present. The IL, 5F; transform 
satisfies the I,4 analytic difference equation in pı 一 
p» due to the contiguous relations obeyed by »F1. The 
IJ transform is only unitary when g is restricted by 
[57], and it is indeed self-dual in the same sense as the 
action-angle map (Ruijsenaars). 

Turning finally to the case N > 2, the multi-variable 
hypergeometric transform c; does have the expected 
duality property. More specifically, its inverse diag- 
onalizes the commuting I, AAOs (Chalykh). For II, 
with N » 2 and g—/5,1—2,3,..., Chalykh also 
finds elementary joint eigenfunctions with the 
expected self-duality. To date, no Hilbert space results 
for the N > 2 II,4 case have been obtained. 


To conclude, we mention that the soliton scatter- 
ing behavior at the classical level is preserved under 
quantization in all cases where this can be checked. 
That is, no new momenta are created in the 
scattering process and the S-matrix is factorized as 
a product of pair S-matrices. Moreover, for the type 
I cases, the $-matrix is a momentum-independent 
(but g-dependent) phase, as a quantum analog of the 
classical billiard ball scattering. 


See also: Bethe Ansatz; Classical r-Matrices, Lie 
Bialgebras, and Poisson Lie Groups; Functional 
Equations and Integrable Systems; Integrable Discrete 
Systems; Integrable Systems and Algebraic Geometry; 
Integrable Systems in Random Matrix Theory; Integrable 
Systems: Overview; Isochronous Systems; Ordinary 
Special Functions; g-Special Functions; Quantum 
Calogero-Moser Systems; Seiberg-Witten Theory; 
Separation of Variables for Differential Equations; 
Sine-Gordon Equation; Toda Lattices. 
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Introduction 


Lagrangian formulations of general relativity (GR) 
were found by Hilbert and by Einstein himself, 
almost immediately after the discovery of the theory. 
The construction of Hamiltonian formulations of 
GR, on the other hand, has taken much longer, and 
has required decades of theoretical research. 

The first such formulations were developed by 
Dirac and by Bergmann and his collaborators, in the 
1950s. Their cumbersome formalism was simplified 
by the introduction of new variables: first by 
Arnowit, Deser, and Misner in the 1960s and then 
by Ashtekar in the 1980s. A large number of 
variants and improvements of these formalisms 
have been developed by many other authors. Most 
likely the process is not over, and there is still much 
to learn about the canonical formulation of GR. 

A number of reasons motivate the study of 
canonical GR. In general, the canonical formalism 
can be an important step towards quantum theory; 
it allows the identification of the physical degrees of 
freedom, and the gauge-invariant states and obser- 
vables of theory; and it is an important tool for 
analyzing formal aspects of the theory such as its 
Cauchy problem. All these issues are highly non- 
trivial, and present open problems, in GR. 

In turn, the structural peculiarity and the con- 
ceptual novelty of GR have motivated re-analyses 
and extensions of the canonical formalism itself. 

The following sections discuss the source of the 
peculiar difficulty of canonical GR, and summarize 
the formulations of the theory that are most 
commonly used. 


The Origin of the Difficulties 


The reason for the complexity of the Hamiltonian 
formulation of GR is not so much in the intricacy of 
its nonlinear field equations; rather, it must be found 
in the conceptual novelty introduced by GR at the 
very foundation of the structure of mechanics. 

The dynamical systems considered before GR can 
be formulated in terms of states evolving in time. One 
assumes that a time variable t can be measured by a 
physical clock, and that certain observable quantities 
A of the system can be measured at every instant of 
time. If we know the state s of the system at some 


initial time, the theory predicts the value A(t) of 
these quantities for any given later instant of time t. 
The space of the possible initial states s is the phase 
space Io. Observables are real functions on D. 
Infinitesimal time evolution can be represented as a 
vector field in To. This vector field is determined by 
the Hamiltonian, which is also a function on To. The 
integral lines s(t) of this vector field determine 
the time evolution A(£) = A(s(t)) of the observables. 

This conceptual structure is very general. It can be 
easily adapted to special-relativistic systems. How- 
ever, it is not general enough for general-relativistic 
systems. GR is not formulated as the evolution of 
states and observables in a preferred time variable 
which can be measured by a physical clock. Rather, 
it is formulated as the relative (common) evolution 
of many observable quantities. Accordingly, in GR 
there is no quantity playing the same role as the 
conventional Hamiltonian. In fact, the canonical 
Hamiltonian density that one obtains from a 
Legendre transformation from a  Lagrangian 
vanishes identically in GR. 

The origin of this peculiar behavior of the theory is 
the following. The field equations are written as 
evolution equations in a time coordinate t. However, 
they are invariant under arbitrary changes of t. That is, 
if we replace t with an arbitrary function + = ?'(t) in a 
solution of the field equations, we obtain another 
solution. This underdetermination does not lead to a 
lack of predictivity in GR, because we do not interpret 
the variable t as the measurable reading of a physical 
clock, as we do in non-general-relativistic theories. 
Rather, we interpret as a nonobservable mathematical 
parameter, void of physical significance. Accordingly, 
the notions of “state at a given time" and “value of 
an observable at a given time" are very unnatural in GR. 

A Hamiltonian formulation of GR requires a 
version of the canonical formalism sufficiently 
general to deal with this broader notion of evolu- 
tion. Generalizations of the Hamiltonian formalism 
have been developed by many authors, such as Dirac 
(see below), Souriau, Arnold, Witten, and many 
others. The first step in this direction was taken by 
Lagrange himself: Lagrange gave a time-independent 
interpretation of the phase space as the space T of 
the solutions of the equations of motion (modulo 
gauges). As we shall see, however, consensus is still 
lacking on a fully satisfactory formalism. 


Dirac Theory of Constrained Systems 


Dirac has developed a Hamiltonian theory for 
mechanical systems with constraints, precisely in 


view of its application to GR. Dirac's theory is 
beautiful, finds vast applications, and it is still 
commonly taken as the basis to discuss Hamiltonian 
GR, although GR does not fit very naturally into 
Dirac's scheme. In the following, only the part of 
Dirac's theory relevant for GR is summarized. 

Consider a Lagrangian system with Lagrangian 
variables q', with i — 1, ..., n. Call v’ the corresponding 
velocities. Let the system be defined by the Lagrangian 
L(q',v'). The momenta are defined as functions of q' 
and v! by pi(q',v) = OL(q',v')/Ov.. The canonical 
Hamiltonian H(q', pj) — v (q', pi)pi — L(q',v'(q',pi)) 
(summation over repeated indices is understood) is 
obtained by inverting the function p;(q', v’) and expres- 
sing the velocities as functions of the momenta v'(q', pi). 
The phase space To is the space of the variables (q', p;). 
Infinitesimal time evolution is given by the vector field 
V —v'(q', p;)0/0q' + fí(q',p;)O/Op;, where velocities 
and forces are given by the Hamilton equations 
v = 0H /Op; and f; = —OH/O0q'. 

More formally, the 2-form w= dp; ^ dq' endows 
Lo with a symplectic structure. In the presence of 
such a structure, every function A determines a 
vector field V4, defined by iy,» = —dA. By inte- 
grating this field, we have a flow in Tọ, called the 
flow generated by A. Time evolution is the flow 
generated by the Hamiltonian. Given two functions 
A and B, their Poisson brackets are defined by the 
function (A, B] - — VA(B) — Vg(A). Therefore, the 
time evolution of an observable 4A satisfies 
dA/dt — (A, H}. A dynamical system is completely 
characterized by the set (I9,w,A,H), where 
A — (A1,..., An) is the ensemble of the observables. 

A constrained system, in the sense of Dirac, is 
a system for which the image of the function v 一 
pi(q',v') is smaller than R”. We can characterize 
the image T of the map (q',v/) — (q', pi) with a set 
of equations on To 


Cold’ f) = 0 [1] 


where a = 1,...,m’. These are called the primary 
constraints. 

The “constraint surface” C is the largest subspace 
of Z which is preserved by time evolution. It can be 
characterized by adding additional constraints, still 
of the form (1), with a=m'+1,...,m. These 
additional constraints, called secondary constraints, 
can be computed as the Poisson brackets of the 
primary constraints with the Hamiltonian (plus the 
Poisson brackets of these secondary constraints with 
the Hamiltonian, and so on, until the Poisson 
brackets of all the constraints with the Hamiltonian 
vanish on in C). We say that an equation holds 
weakly if it holds on C. 
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A constrained system is “first class” if the Poisson 
brackets of the constraints among themselves 
vanishes weakly. Maxwell theory and GR are first- 
class constrained systems. In a first-class constrained 
system, the constraints generate flows that preserve 
C and foliate it into “orbits.” The space of these 
orbits is called the physical phase space (see 
Figure 1). 

This flow is interpreted as a “gauge” transforma- 
tion, namely as a change of mathematical descrip- 
tion-of the same physical state. As first observed by 
Dirac, such interpretation is necessary if we demand 
a deterministic physical evolution, for the following 
reason. A first-class constrained system is a system 
in which the time evolution q'(t) of the Lagrangian 
variables is not completely determined by the 
equations of motion. (The relation between con- 
straints and underdetermination of the evolution is 
simple to understand. In a Lagrangian system, the 
number of equations of motion is equal to the 
number of Lagrangian variables. If one of these 
equations is a constraint (between the initial 
velocities and initial coordinates), then one evolu- 
tion equation is missing.) To recover a deterministic 
physical evolution, we must interpret two “mathe- 
matical" states that can evolve from the same initial 
data, as describing the same “physical” state. As 
shown by Dirac, the transformations generated by 
the constraints are precisely the ones that implement 
such an identification. 

It follows that the physical states must be identified 
with the equivalence classes of the points of C under 
the gauge transformations generated by the con- 
straints, namely with the orbits of their flow. It is 
easy to show that (locally) there is a unique 
symplectic 2-form wph on Pph such that its pullback 
to C is equal to the pullback of w to C (iw = T,wphs 
see Figure 1). Physical observables Aj, are functions 
on C that are gauge invariant, namely constant on 


" odiis. 
Orbits 
C 
T 
T ie 
Space of the orbits 


Figure 1 The structure of a first-class constrained system. 
Fo: phase space, C: constraint surface, T'pn: physical phase 
space; i: imbedding of C in I; m projection to orbit space 
(sending each point into its orbit). 
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the orbits. That is, they are functions on Tph. The 
Hamiltonian is a physical observable. The dynamical 
system (I phs wphy Ap, H), where Aph is the ensemble 
of the physical observables, is a complete description 
of the physical system, called the gauge-invariant 
formulation, with no more constraints or gauges. 

For instance, the phase space of Maxwell theory is 
coordinatized by the — Maxwell potential 
A,,(x), 4=0,1,2,3, and its conjugate momentum 
E"(x). Since the time derivative of Ag does not 
appear in the Maxwell action, the primary con- 
straint is 


E? (x) = 0 [2] 


The secondary constraint turns out to be the Gauss 
law, 


ðE (x) = 0 [3] 


where a=1,2,3. The first generates arbitrary 
transformations of Ao, while the second gene- 
rates the time-independent gauge transformations 
6A,(x) —0,A(x). The pair (Ao, 7?) can be dropped 
altogether, since it is formed by a pure gauge 
variable and a variable constrained to vanish. 
The (gauge-invariant) Hamiltonian is H —1/87z 
[ d?x (E^E, + B2 B,), where B^— &"*9,A. is the 
magnetic field and E^ is easily recognized as the 
electric field. E^ and B, are the physical 
observables. 


General Structure of GR Constraints 


GR fits into Dirac theory with a certain difficulty. 
Since the constraints are the generators of the gauge 
invariances, it is easy to determine their structure in 
GR. The gauge invariances of GR are given by the 
coordinate transformations x^" — x” = f"(x), where 
x =(x,t). Accordingly, we have four primary con- 
straints 7“ = 0, analogous to [2], and four secondary 
constraints C,(x) —0, analogous to [3]. These are 
usually separated into the three “momentum” 
constraints 


Ca(x) = 0 [4] 


which generate fixed-time spatial coordinate trans- 
formations and the “Hamiltonian” constraint 


C(x) = 0 [5] 


which generates changes in the t coordinate. 

The metric g(x) that represents the gravitational 
field in Einstein's original formulation has ten 
independent components per point. Each first-class 
constraint indicates that one Lagrangian variable is 
a gauge degree of freedom. The physical degrees of 


freedom of GR are therefore (10 — 4 —4) —2 per 
point. In the linearized theory, these are the two 
degrees of freedom that describe the two polariza- 
tions of a gravitational wave of given momentum. 
Formulations of GR in which there are additional 
gauge invariances (such as Cartan's tetrad formula- 
tion, see below) have, accordingly, more constraints. 

Since the Hamiltonian generates evolution in the 
Lagrangian evolution parameter ¢, and since such 
evolution can be obtained as a gauge transforma- 
tion, it follows that the Hamiltonian is a constraint 
in GR. The vanishing of the Hamiltonian is a 
characteristic feature of general-relativistic systems. 
The Hamiltonian structure of GR is therefore 
determined by its phase space and its constraints. 
The gauge-invariant formulation of the theory is 
given just by the set (I ph, wphy Aph) and no Hamilto- 
nian. The physical interpretation of this structure is 
discussed in the last section. 


ADM Formalism 


In Einstein's formulation, the Lagrangian variable of 
GR is the metric field g,,(x,?) (here we use the 
signature [一 , +, +,+]). Arnowit, Deser, and 
Misner have introduced the following change of 
variables: 


Qab = Sab; N = 1/ V —p9, N* = q” gao [6] 


where g® is the inverse of the three-dimensional 
metric qab, used henceforth to raise and lower space 
indices a,b=1,2,3. This is equivalent to writing the 
invariant interval in the form 


ds? = —N? d£? + qu, (dx^ + N° dt)(dx? + N* dr) 


These variables have an interesting geometric inter- 
pretation. Consider a family of spacelike (“ADM”) 
surfaces X, defined by t= constant. qap is the 3-metric 
induced on the surface. N is called the “lapse” function 
and N is called the “shift” function. Their geometrical 
interpretation is illustrated in Figure 2. 

When written in terms of these variables, the 
action of GR takes the form 


Slab, N, N°] = / d*x./gN[R + kapk® — &?] 


where q = det qap and R are the determinant and the 
Ricci scalar of the metric qap; 


1 
2N 


is the extrinsic curvature of the constant time 
surface; and D, is the covariant derivative of qap- 
This action is independent of the time derivatives of 


kab = (idab = DaN, = D,Naz) 


t-- dt Na dt 


Figure 2 The geometrical interpretation of the lapse N(x, f) 
and shift N?(x,t) fields. Two ADM surfaces, defined by the 
values t and t + dt, are displayed. N(x, t)dt is the proper length 
of the vector joining the two surfaces, normal to the first surface 
at (x, t). This is the proper time lapsed between the two surfaces 
for an observer at rest on the first surface at (x, t). The quantity 
dx? — N?(x,t)dt is the shift (the displacement) between the 
endpoint of this vector and the point (x, t + df) having the same 
spacial coordinates as (x, t). 


N and N*. The conjugate momenta 7 and 7; of these 
quantities are therefore the primary constraints and 
the pairs (m, N) and (Ta, N?) can be taken out of the 
phase space as for the pair (EP, Ao) in the Maxwell 
example. We can therefore take the 3-metric qap(X) 
and its conjugate momentum f^ (x) as the canonical 
variables of GR. The momentum is related to the 
“velocity” O;gap, by 


pe = valk - kq*”) 
where k = kaq”. 


The secondary constraints [4] and [5] turn out to be 


C, = VaD, (5 p'a) =0 [7 


and 
di 
vd 


where p =p"" qab 
If the two fields g,,(x,t) and p^^(x,t) satisfy the 
Hamilton equations 


1 
C (az E ;") P v qR = 0 [8] 


Oden. — fap (xt); HE 9) 
ab 
= (p^ (x, t), H(t)) [10] 


where 


H(t) = f dx N(x, t)Clqap(x,), p” (æ, t) 
T N" (x, t)C; qux, t), p^ (x, t)| 


with arbitrary functions N(x,t),N*(x,t), then the 
metric g(x, t), defined from g,,, N, N” by eqn [6], is 
the general solution of the vacuum Einstein equation 
Ricci[g] 2 0. Therefore, these equations provide a 
Hamiltonian form of the Einstein field equation. 
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Tetrad Formalism 


The tetrad formalism, developed by Cartan, Weyl, 
and Schwinger, has definite advantages with respect 
to the metric formalism. It allows the coupling of 
fermion fields to GR and is, therefore, needed to 
couple the standard model to GR. In the tetrad 
formalism, the gravitational field is represented by 
four covariant fields el (x), where 1, J,...—0,1,2,3 
are flat Lorentz indices raised and lowered with the 
Minkowski metric 7; = diag[ — 1, +1, +1, +1]. The 
relation with the metric formalism is given by 


Suv = Mene, 


In this formulation, GR has an additional local 
SO(3,1) gauge invariance, given by local Lorentz 
transformations on the I indices. The corresponding 
canonical formalism is usually defined in a gauge 
in which ej—0, where i,j,...=1,2,3 are flat 
three-dimensional indices raised and lowered with 
the ój;-—diag[--1, +1, +1]. In this gauge, the 
Lorentz group is reduced to the local SO(3) group 
of spatial transformations, and the ADM variable 


are defined by 
N N 


where N' —e' N^. This is equivalent to writing the 
invariant interval in the form 


ds? = —N? de? + (ej; dx" + Nj dr) (ej, dx? + N' dr) 


The reduced canonical variables can be taken to be 
the field e’(x) that represents the “triad” of the 
ADM surface, and its conjugate momentum p%(x). 
Their relation with the three-dimensional metric 
variables is given by transforming internal indices 
into tangent indices with the triad field e!, and its 
inverse ef. In particular, 


dab = 6jje;e, [12] 
p” = ep; [13] 
Also, for later reference, 
pm h So gi | 
] ab - ; l1. 
a= e" Rab dete (ps jeb) [14] 


where p — e'p?. 

The momentum and Hamiltonian constraints are 
the same as in the ADM formulation, with g,, and 
pe expressed in terms of the triad variables. The 
additional constraint that generates the internal 
rotations is 


Gi = eel p^* — 0 [15] 
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Ashtekar Formalism 


The Ashtekar formalism simplifies the form of the 
constraints and casts GR in a form having the same 
kinematics as Yang-Mills theory. With its variants, it 
is widely used in nonperturbative quantum gravity, in 
particular in the loop formulation (see Loop Quan- 
tum Gravity). It can be obtained from the tetrad 
canonical formalism by the canonical transformation 


Ai —léQw/ + iki [16] 
E; = det ee; [17] 


where w = wi dx^ is the (torsion-free) spin connec- 
tion of the triad 1-form field e' = e dx^, determined 
by the Cartan equation 


de! + wi Aet =0 


The “electric” field E is real, while the Sen—Ashtekar 
connection A’= A' dx^ is complex and satisfies the 
reality condition 


A! + Ai — 2T'[e] [18] 


The connection A’ has a simple geometrical inter- 
pretation. It is the pullback Agi — wt) on the t=0 


ADM surface of the self-dual part 


1 | 
ut" = 5 c BE! «f aa.) 


of the four-dimensional torsion free spin connection 
wi! determined by the tetrad field e. 

In terms of these fields, the constraint equations 
can be written in the form 


G; = D,E? — 0 [19] 
C, m ELE? =0 [20] 
C = epp EES — 0 [21] 


where D, is the covariant derivative and F,, is the 
curvature defined by the connection A. The first of these 
constraints is the nonabelian version of the Gauss law 
[3]: it is the gauge constraint of Yang-Mills theory. The 
constraints are polynomial in the canonical variables. 

These equations are often written using a basis 7; 
in the su(2) Lie algebra, and defining the su(2) 
connection A= A'r; and the su(2)-valued vector 
field E^ =E7;. In terms of these fields the con- 
straints can be written in the form 


G = D,E! = 0 
M tr|F,,E*| ex. b) 
C = tr[E,,E^ E^] = 0 


where the trace is on su(2). 


A variant of this formalism commonly used in 
quantum gravity is obtained by replacing [16] with 
the Barbero connection 

A= 3 ew + yki [22] 
where y is an arbitrary complex number, called the 
Immirzi parameter. In terms of this connection, [21] 
is replaced by 


nM 1 2 
G= ci Fi, EPE 十 —- det e(k i k^" A k^) = ( 


where e’ and kap are given as function of E and A by 
[22] and [17]. The choice y = 1, with the constraint 
[19]-[21], gives the canonical formulation of Eucli- 
dean GR. 

All the formulations described extend readily to 
matter couplings. The structure of the constraints 
remains the same — with additional constraints corre- 
sponding to matter gauge invariances, if any. The GR 
constraints are modified by the addition of matter terms. 
In particular, the Hamiltonian constraint C and the 
momentum constraint C; are modified by the addition 
of terms determined by the energy density and the 
momentum density of the matter, respectively. In the 
Ashtekar formulation, a fermion field modifies the 
Gauss law constraint by the addition of a torsion term. 


Evolution 


In the gauge-invariant canonical structure of GR, there 
is no explicit time flow generated by a Hamiltonian. If 
the formalism is utilized just in order to express the 
Einstein equation in first-order canonical form, this is 
not a difficulty, because evolution in the coordinate 
time is generated by the constraints. On the other 
hand, if we are interested in understanding the 
structure of states, observables, and evolution of GR, 
the situation appears to be puzzling. An additional 
complication arises from the fact that virtually no 
gauge-invariant observable Ah is known explicitly as 
a function on the phase space. These issues become 
especially relevant when the canonical formalism is 
taken as a starting point for quantization. How is 
physical evolution represented in canonical GR? 

The first relevant observation is that the gauge- 
invariant phase space Iph is better understood as a 
phase space in the sense of Lagrange: namely as the 
space T of the solutions of the equations of motion 
modulo gauges, rather than a space of instantaneous 
states. Recall that in GR the notion of “instanta- 
neous state" is rather unnatural. 

In the ADM formulation, for instance, an orbit on 
the constraint surface of GR can be understood as 
the ensemble of all possible values that the variables 


(qu (x), p^^(x)) can take on arbitrary spacelike ADM 
surfaces embedded in a given solution of the 
Einstein equation. Motion along the orbit (which 
has dimension 4 x oc?) corresponds to arbitrary 
deformations of the surface. 

Physical applications of classical GR deal with 
relations between “partial observables.” A partial 
observable is any variable physical quantity that can 
be measured, even if its value cannot be determined 
from the knowledge of the physical state. An example 
of partial observable in nonrelativistic mechanics is 
given precisely by the nonrelativistic time ¢. Partial 
observables are represented in GR as functions on To. 
A physical state in Th determines an orbit in C, and 
therefore a set of relations between partial observables 
(see Figure 1). That is, it determines the possible values 
that the partial observables can take “when” and 
“where” other partial observables have given values. 
All physical predictions of classical GR can be 
expressed in this form. 

One of the partial observables can be selected to 
play the role of a physical clock time, and evolution 
can be expressed in terms of such clock time. In 
general, it is difficult — if not impossible — to find a 
clock time observable in terms of which evolution is 
a proper conventional Hamiltonian evolution. Mat- 
ter couplings partially simplify the task. For 
instance, if the motion of planet Earth is coupled 
to GR, then proper time along this motion from a 
significative event on Earth, which is a partial 
observable, can be a convenient clock time. In pure 
gravity, the “York time” defined as the trace of the 
extrinsic curvature Ty = k, on ADM surfaces where 
k is spatially constant, has been extensively and 
effectively used as a clock time in formal analysis of 
the theory. A Hamiltonian that generates evolution 
in a given clock time T can be formally obtained by 
solving the Hamiltonian constraint with respect to a 
momentum Pr conjugate to T. Such “reparametriza- 
tions” of the relative evolution of the partial 
observables can be useful to analyze equations and 
to help intuition, but they are by no means necessary 
to have a well-defined interpretation of the theory. 

Another possibility to introduce a preferred time 
flow is to consider asymptotically flat solutions of 
the field equations. In this case, one can define a 
nonvanishing Hamiltonian, given by a boundary 
integral at spacial infinity. This Hamiltonian gen- 
erates evolution in an asymptotic Minkowski time. 
This choice is convenient for describing observations 
performed from a large distance on isolated gravita- 
tional systems. Many general-relativistic physical 
observations do not belong to this category. 

Various other techniques to define a fully gen- 
erally covariant canonical formalism have been 
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explored. Among these: definitions of the physical 
symplectic structure directly on the space of the 
solutions of the field equations; generalization of the 
initial and final surfaces to boundaries of compact 
spacetime regions; construction of “evolving con- 
stants of motion,” namely families of gauge-invar- 
iant observables depending on a clock time 
parameter; multisymplectic formalisms that treats 
space and time derivatives on a more equal footing; 
and others. Many of these techniques are attempts 
to ‘overcome the unequal way in which time and 
space dependence are treated in the conventional 
Hamiltonian formalism. 

GR has deeply modified our understanding of 
space and time. An extension of the canonical 
formalism of mechanics, compatible with such a 
modification, is needed, but consensus on the way 
(or even the possibility) of formulating a fully 
satisfactory general-relativistic extension of Hamil- 
tonian mechanics is still lacking. 


See also: Asymptotic Structure and Conformal Infinity; 
Constrained Systems; General Relativity: Overview; 
Loop Quantum Gravity; Quantum Cosmology; Quantum 
Geometry and its Applications; Spin Foams; 
Wheeler—De Witt Theory. 
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Introduction 


Shared entanglement between a sender and receiver 
can significantly improve the usefulness of a 
quantum channel for the communication of either 
classical or quantum data. Superdense coding and 
teleportation provide the most well-known examples 
of this improvement; free entanglement doubles the 
classical capacity of a noiseless quantum channel 
and makes it possible for a noiseless classical channel 
to send quantum data. In fact, the entanglement- 
assisted classical and quantum capacities of a 
quantum channel are in many senses simpler and 
better behaved than their unassisted counterparts 
(Holevo 1998, Schumacher and Westmoreland 
1997, Devetak 2005). Most importantly, these 
capacities can be calculated using simple formulas 
and finite optimization procedures (Bennett ef al. 
1999, 2002). No such finite procedure is known for 
either of the unassisted capacities. Moreover, the 
entanglement-assisted classical and quantum capa- 
cities are related by a simple factor of 2. The 
unassisted capacities, in contrast, have completely 
different formulas. In fact, the simple factor of 2 
generalizes to a statement known as the quantum 
reverse Shannon theorem, which governs the rate at 
which one quantum channel can simulate another 
(Bennett et al. 2005). The answer is given by the 
ratio of the entanglement-assisted capacities. 


Notation 


Quantum systems will be denoted by A, B, and so 
on as well as their variants such as A' and A. The 
choice of letter will generally indicate which party 
holds a given system, with A reserved for the sender, 
Alice, and B for the receiver, Bob. Given a quantum 
system C, C*" will often be written as C". These 
symbols will be used to denote both the Hilbert 
space of the quantum system and the set of density 
operators on that system. Thus, a quantum channel 
N :4 一 B refers to a trace-preserving, completely 
positive (TPCP) map from the operators on the 
Hilbert space of A' to those of B. id^ refers to the 
identity channel on C. The map N &id^ will 
frequently be abbreviated to M in order to simplify 
long expressions. Likewise, the density operator 
lv)(y| of a pure quantum state |o) will be 
abbreviated to y. «^ will refer to the maximally 
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mixed state on C and zy to the maximally mixed 
state on a specified d-dimensional quantum system. 

For a given quantum state ^P on the composite 
system AB, q^ — trg ^P and 


H(A), = H(p*) = —tr(^ log, q^) [1] 
is the von Neumann entropy of v^, while 
H(AB),, = —I.(A)B) = H(AB), — H(B) 
is its conditional entropy and 
1(A;B), = H(A),--H(B), — H(AB), — [3] 


its mutual information. 


Entanglement-Assisted Classical 
and Quantum Capacities 


The entanglement-assisted classical capacity of a 
quantum channel VV: A'— B is the optimal rate at 
which classical information can be communicated 
through the channel while in addition making use of 
an unlimited number of maximally entangled states. 

The formal definition proceeds as follows. Alice 
and Bob are assumed to share nS ebits in the form of 
a maximally entangled state |)" of Schmidt rank 
2"5. Conditioned on her message m € {1,2,...,2”¥}, 
Alice will apply an encoding operation Em : A — A”. 
Bob's decoding is given by a POVM a. on the 
composite system BB”. The procedure is said to have 
maximum probability of error e if 


max tr [Am N” 0 Em)(®)| >1-e [4] 


These elements, illustrated in Figure 1, consisting of 
the shared entanglement, as well as the encoding and 
decoding operations meeting the criterion of eqn [4], 
are called a (2"^, 2/5, n, e€) entanglement-assisted clas- 
sical code for the channel M. A rate R is said to be 
achievable if there exists a choice of S>0O and a 
sequence of entanglement-assisted classical codes 
(2"R, 275. n, en) with e — 0. The entanglement-assisted 


A” B” 


Figure 1 Circuit representation of the elements of an 
entanglement-assisted classical code for the channel M. Alice 
encodes message m by applying the operation Em to her half 
of the shared entanglement. Bob decodes by applying the 
POVM {Am} on the output of the channel and his half of the 
shared entanglement. 


classical capacity Cg(N) of N is defined to be the 
supremum over all achievable rates. 


Theorem 1 (Bennett et al. 1999, 2002). The 
entanglement-assisted classical capacity Cg of a 
quantum channel N : A' 5 B is given by 


Cg (A) = max I(A; B), [5] 
where the maximization is over states a ^P — N (^^) 
arising from the channel by acting on the A’ half of 
any pure state |oy^^ 


The theorem bears a strong formal resemblance to 
Shannon's noisy coding theorem for the classical 
capacity of a classical noisy channel. There the 
capacity formula is also given by an optimization of 
the mutual information, but over joint distributions 
between the input and output alphabets arising from 
the action of the channel. Such a joint distribution 
cannot exist in general for a quantum channel 
because the no-cloning theorem excludes the possi- 
bility of the input and output existing simulta- 
neously. Equation [5] instead refers to a natural 
substitute for the joint input-output distribution: a 
quantum state arising from the quantum channel 
acting on half of an entangled pure state. 

Another point worth stressing is that, unlike the 
known formulas for the unassisted classical and 
quantum capacities of a quantum channel, eqn [5] 
refers to only a single use of V instead of the limit 
of many uses, A^". The formula can therefore 
readily be used to evaluate CE for any channel of 
interest. 

Consider, for example, the d-dimensional depo- 
larizing channel 


D,(p) = (1 — p)p + Pra [6] 


that with probability p completely randomizes the 
input but otherwise leaves the input invariant. For 
such channels, the maximum is achieved by choos- 


ing a maximally entangled state for Koy^^ , yielding 
CE(D,) 一 2 log, d 
d? —1 
ae (1 = pa) [7] 
where for any 0 € q € 1 and integer r 1, 


h,(q) = — qlog;q — (1 — q) 
d m 
x log, (11) 8 


is the Shannon entropy of the 
(qs = @) = 1,0 -—qui-1iy. 

Entanglement assistance also simplifies the rela- 
tionship between the classical and quantum 


distribution 
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capacities of a channel. Proceeding as before to 
formally define the quantum capacity, Alice and Bob 
are again. assumed to share a maximally entangled 
state |b)" of Schmidt rank 2"5. Alice's encoding 
operation will be a TPCP map £: ÀA — A" acting 
on an input system À and her half of the shared 
entanglement, A. Bob's decoding will likewise be a 
TPCP map D: BB" — B acting on the output of the 
channel, B", and his half of the shared entangle- 
ment, B. À and B are assumed to be isomorphic 
quantum systems of some fixed dimension 2”2. The 
procedure is said to have subspace fidelity 1 — c if 


5 (e| (D oN" o £) (o^? ® e^) lg z1-e [9] 


for all |p) € A. These elements, illustrated in 
Figure 2, are together called a (2"9,2/5,7,«) 
entanglement-assisted quantum code for the channel 
N. A rate Ọ is said to be achievable if there exists a 
choice of $70 and a sequence of entanglement- 
assisted quantum codes (2"^, 2/5, n, er) with e, — 0. 
The entanglement-assisted quantum capacity Qg( N) 
of N is defined to be the supremum over all 
achievable rates. 

There is considerable freedom in the definition of 
the entanglement-assisted quantum capacity. It 
could, for example, be defined as the largest amount 
of maximal entanglement that can be generated 
using the channel, minus the entanglement con- 
sumed during the protocol itself. Alternatively, the 
fidelity criterion eqn [9] could be strengthened to 
require that Do. “oE preserve not only pure 
states on 4 but any entanglement between A and a 
reference system. All of these variants yield the same 
capacity formula: 


QE(N) — 3 CEN) [10] 


This equivalence is a direct consequence of the 
existence of the teleportation and superdense coding 
protocols. When maximal entanglement is available, 
teleportation converts the ability to send classical 
data into the ability to send quantum data at half 
Conversely, 


the classical rate. by consuming 


Figure 2 Circuit representation of the elements of an 
entanglement-assisted quantum code for the channel M. E is 
Alice’s encoding operation, which acts on both her input state 
and her half of the shared entanglement. Bob decodes using a 
quantum operation D acting on the output of the channel and his 
half of the shared entanglement. 
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maximal entanglement, superdense coding converts 
the ability to send quantum data into the ability to 
send classical data at double the quantum rate. 


Sketch of Proof 


The proof of a capacity theorem can usually be 
broken into two parts, achievability and optimality. 
The achievability part demonstrates the existence of 
a sequence of codes reaching the prescribed rate 
while the optimality part shows that it is impossible 
to do better. 
The main idea in the achievability proof can be 
Mie ue by studying the special case where 
— q^. Let d —dimA' and (UY , be a set of 
aa operators for A”. The ev Property of 
these operators is that averaging over them imple- 
ments the constant map: for all density operators p, 


1 In 
ee UipU! = «^ [11] 
A' 


Consider the state o; that arises if Alice acts with U; 
on the A" half of a rank-d’, maximally ad 
state |o)^^" and then sends the A” half of the 
resulting state through M. (Note that here A” also 
plays the role of A.) The entropy of the resulting 
state Is 


H(s)-H(W(Uj;&1g)e(Uj$15)) (12 


= H(N ()) [13] 


since U; does not change the local density operator 
on A" 

On the other hand, if Alice selects a value of j 
from the uniform distribution, then the resulting 
average input state to the channel will be 


ni" OTi =" Qy’ [14] 


and the corresponding average output state will be 
N(Y”) & q^, which has entropy 


H(N(p*")) + H(v^) [15] 


Therefore, the Holevo quantity of the ensemble of 
output states, defined as the entropy of the average 
state minus the average of the entropies of the 
individual output states, will be equal to 


H(e^) + H(N(e^")) -H(w(e^^)) — t8 


This is precisely the quantity I(A; B), for the state 
AK(p^^") since the channel M transforms the A" 
system into B. Moreover, if Bob is given the A part of 
the maximally entangled state, then this is the Holevo 


quantity of an ensemble of states that can be produced 
by Alice acting on half of a shared entangled state and 
then sending her half through the channel. Invok- 
ing the Holevo-Schumacher-Westmoreland (HSW) 
theorem for the classical capacity (Holevo 1998, 
Schumacher and Westmoreland 1997) therefore com- 
pletes the proof; using coding, the Holevo quantity is 
an achievable communication rate. 

The proof that eqn [5] is optimal involves a series 
of entropy manipulations similar to the optimality 
proofs for the unassisted classical and quantum 
capacities. From the point of view of quantum 
information, the truly unusual part of the proof is 
the demonstration that it is unnecessary to consider 
multiple copies of .V (Cerf and Adami 1997). 
Specifically, let 


f(.N) = maxI(A; B), [17] 
where the maximization is defined as in Theorem 1. 
Techniques analogous to those used for the unas- 
sisted capacities yield the upper bound 


TO 6 
CE(N)< lim -f(N*") 18} 
Unlike the unassisted case, however, a relatively easy 
argument shows that 


f(Ni @N2) = f(N1) + f(N2) [19] 


(The analogous statement is an important conjecture 
for the classical capacity and is known to be false for 
the quantum capacity (DiVincenzo et al. 1998).) As 
a result, Ce(N) € f(N), which is the optimality part 
of Theorem 1. 

To see the origin of eqn [19], it will be helpful to 
invoke Stinespring's theorem to write V; = =trgU; ^, 
where U;:A:— B;E; is an isometry. Fix a state 
[oy A^: ad v= (Ui @U>r)(y). Equation [19] 
follows from the fact that 


I(A; B, Bz), € I(AB2E2; B1), 


+ I(AB4E1; B3), [20] 
Simply redefining A to be AB; E; shows that the first 
term of the right-hand side is upper bounded by 
f (N1). The second term, likewise, is upper bounded 
by f(N2). Equation [20] is itself equivalent to the 
inequality 
H(B4B;|E,E?), + H(B1B;), 
< H(Bi|E1), + H(B2|E2), 
+ H(B4), + H(B2), [21] 


The inequality H(B4B5), < H(B4), + H(B;), holds 
by the subadditivity of the von Neumann entropy. 


Repeated applications of the strong subadditivity 
inequality, moreover, lead to the inequality 


H(B,B3|E4E;), € H(Bi|E1), 
+ H(B2|E2), [22] 


Together, they prove eqn [20] and, thence, eqn [19]. 
The intuitive meaning of this “single-letterization” is 
unclear, but regardless, it is interesting to note that 
the proof involved invoking a pair of purifying 
environment systems, E, and E2, and studying the 
entropy relationships between the true outputs of 
the channel and the environment’s share. 


The Quantum Reverse Shannon Theorem 


A strong argument can be made that the entanglement- 
assisted capacity of a quantum channel is the most 
important capacity of that channel and that all the 
other capacities are, in some sense, of less significance. 
The fact that it is unnecessary to distinguish between 
the classical and quantum entanglement-assisted capa- 
cities because they are related by a factor of 2 is a hint 
in that direction, as is the simple, single-letter formula 
for CE(N ). 

A more general argument can be made by 
considering the problem of having one channel 
simulate another. Indeed, the quantum capacity of 
a quantum channel is simply the optimal rate at 
which that channel can simulate the noiseless 
channel id? on a single qubit. Likewise, the classical 
capacity of a quantum channel is its optimal rate for 
simulation of a qubit dephasing channel 


p |0)(01p]0) (0| + |1) (11p] 1 (1 [23] 


In this spirit, the fact that Ce(N)=2Qg(N) can be 
re-expressed in the form 
CE(N ) 

Equivalently, when entanglement is free, the optimal 
rate at which M can simulate a noiseless qubit channel 
is given by the ratio between the entanglement- 
assisted classical capacities of M and id). The 
quantum reverse Shannon theorem generalizes this 
statement to the simulation of arbitrary channels in 
the presence of free entanglement. 

Suppose that Alice and Bob would like to use 
N1: A' B to simulate another channel M3 : A’ — B. 
Fix an input state y^ and let |y)““" be a purification 
of (p^ )*". As always, assume that Alice and Bob share 
a maximally entangled state |$)^P of Schmidt rank 
2"5, Alice's encoding operation will be a TPCP map 
E€: AA" — A"" acting on n copies of the input system 
A’ and her half of the shared entanglement, A. Bob's 
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decoding will likewise be a TPCP map D: B"B — B" 
acting on m copies of the output of the channel, and his 
half of the shared entanglement, B. This procedure is 
said to e-simulate N5” on (24 )?" if 


F(N" (p^^"), (Do N?” o £) (93 @ ^4") 
之 1 一 6 [25] 


where F is the mixed state fidelity F(p,o)= 
(tr,/p'/2cp'/2)*. The entire procedure, illustrated in 
Figure 3, is said to be a (2/5, m,n,e) entanglement- 
assisted simulation of M2 by M1. A rate R, measured 
in copies of V5 per copy of Nj, is said to be 
achievable for 24 if there exists a choice of $ > 0 and 
a sequence of (2”°,m,,n,€,) entanglement-assisted 
simulations with n/m, — R while e, — 0. 

The quantum reverse Shannon theorem states 
that the entanglement-assisted capacity completely 
governs the achievable simulation rates. 


Theorem 2 (Winter 2004, Bennett et al.). Given 
two channels N,:A'—B and N5:A' B,R is an 
achievable simulation rate for Na by .N and all 
input states p^ if and only if 
CE(N 1) 
Rx ————- 26 
~ CE(N2) 26 
Note that the form of eqn [26] ensures that the 
simulation is asymptotically reversible: if a channel 
N1 is used to simulate M> and the simulation is then 
used to simulate Mı again, then the overall rate 
becomes 
CE(W1) Ce(N2) _ 1 27] 
Ce(N 2) CE(N 1) 
Thus, in the presence of free entanglement and for a 
known input density operator of the form (y“’)®”, a 
single parameter, the entanglement-assisted classical 
capacity, suffices to completely characterize the 
asymptotic properties of a quantum channel. 


An 


NE" E 


(a) (b) 

Figure 3 Circuit representation of an entanglement-assisted 
simulation of A? by .V4. (a) The simulation circuit, with Alice's 
encoding operation € acting on n copies of A’ and Bob's 
decoding operation producing n copies of B. (b) The circuit that 
the protocol is intended to simulate. As stated, the quantum 
reverse Shannon theorem allows the simulation circuit to depend 
on the density operator of the input state restricted to A”. 
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Moreover, since two channels that are asymptoti- 
cally equivalent without free entanglement will 
surely remain equivalent if free entanglement is 
permitted, eqn [26] gives essentially the only 
possible nontrivial, single-parameter asymptotic 
characterization of quantum channels. This is the 
sense in which the entanglement-assisted capacity 
should be regarded as the most important capacity 
of a quantum channel. 

The proof of the quantum reverse Shannon 
theorem is quite involved, but some of its features 
can be understood without much work. First, note 
that by the optimality statement of the entanglement- 
assisted classical capacity, the desired simulation can 
exist only if eqn [26] holds. Otherwise, composing 
the simulation of M2 by V with a sequence of codes 
achieving CE(NW2) would result in a sequence of codes 
beating the capacity formula for Ny. 

Similarly, note that one method to simulate a 
channel Mı using V? is to first use M2 to simulate 
the noiseless channel! and then use the simulated 
noiseless channel to simulate M1. Since the achiev- 
able rates for the first step are characterized by the 
entanglement-assisted capacity theorem, proving the 
achievability part of Theorem 2 reduces to finding 
protocols for simulating a general noisy quantum 
channel \V> by a noiseless one. That perhaps sounds 
like a strange goal, but nonetheless is the difficult 
part of the quantum reverse Shannon theorem. 

It is likely that the quantum reverse Shannon 
theorem can be extended to cover other types of 
inputs than the known tensor power states (^ )^". 
The most desirable form of the theorem would be 
one valid for all possible input density operators on 
A'?", providing a single simulation procedure 
dependent only on the channels and not the input 
state. It is known that without modifying the form 
of the free entanglement, this most ambitious form 
of the theorem fails, but it is conjectured that the 
full-strength theorem does hold provided very large 
amounts of entanglement are supplied in the form of 
the so-called embezzling states (van Dam and 
Hayden 2003). 


Relationships between Protocols 


There is another sense in which the entanglement- 
assisted capacity can be viewed as the fundamental 
capacity of a quantum channel: an efficient protocol 
for achieving the entanglement-assisted capacity can 
be converted into protocols achieving the unassisted 
quantum and classical capacities, or at least very 
close variants thereof. 

An efficient protocol in this case refers to one that 
does not waste entanglement. Suppose that V : A‘ — B 


can be written trr U’? for some isometry U2". Let 
lp)“ be a pure state and |o) SE = UPE \p\** the 
corresponding purified channel output state. Careful 
analysis of the entanglement-assisted classical commu- 
nication protocol achieving the rate I(A; B), leads to 
an entanglement-assisted quantum communication 
protocol consuming entanglement at the rate 
(1/2)I(A; E), ebits per use of A and yielding commu- 
nication at the rate of (1/2)I(A; B), qubits per use M. 
The protocol achieving this goal is known as the 
*father" (Devetak et al. 2004). 

If the entanglement consumed in the father were 
actually supplied by quantum communication from 
Alice to Bob, then the net rate of quantum 
communication produced by the resulting protocol 
would be (1/2)1(A; B), — (1/2)I(A; E), qubits from 
Alice to Bob, that is, the total produced minus the 
total consumed. 

This quantity, how much more information B has 
about A than E does, can be simplified using an 
interesting identity. Since |o) ^P. is pure, 


I(A; E), = H(A), + H(E), ~ H(AE), [28] 


=H(A),+H(AB),—H(B), (29 


Expanding I(A; B), and canceling terms then reveals 
that 


M(A: B) = H(A; E) = —H(A|B), 
T I. (A)B), [30] 


where the function Ie is known as the coherent 
information. After optimizing over input states and 
multiple channel uses, this is precisely the formula for 
the unassisted quantum capacity of a quantum channel 
(Devetak 2005). Thus, the net rate of qubit commu- 
nication for the protocol derived from the father 
exactly matches the rates necessary to achieve the 
unassisted quantum capacity. The only caveat is that 
the protocol derived from the father uses quantum 
communication catalytically, meaning that some com- 
munication needs to be invested in order to get a gain 
of I.(A) B). For the unassisted quantum capacity, no 
investment is necessary. Nonetheless, detailed analysis 
of the situation reveals that the amount of catalytic 
communication required can be reduced to an amount 
sublinear in the number of channel uses, meaning the 
rate of required investment can be made arbitrarily 
small. In this sense, the father protocol essentially 
generates the optimal protocols for the unassisted 
quantum capacity. 

Protocols achieving the unassisted classical capa- 
city can be constructed in a similar way. In this case, 
one starts from an ensemble E= {p; N (Y) of 
states generated by the channel. Achievability of 


the unassisted classical capacity formula follows 
from achievability of rates of the form 


x(E)=H( 3 pr Qf) 
-Misn(Nw) 31 


for arbitrary ensembles of output states. Consider 
the channel 


N (p) = Y Goli) N (Qj) [32] 


and input state |p) A* = >}; VB |i)" - If c=N (vp), 
then I(A; B), is equal to x(£). Thus, there are protocols 
consuming entanglement that achieve the classical 
communications rate x(£) for the modified channel 
N. Because the channel M includes an orthonormal 
measurement which destroys all entanglement between 
A and B, however, it can be argued that any 
entanglement used in such a protocol could be replaced 
by shared randomness, which could then in turn be 
eliminated by a standard derandomization argument. 
The net result is a procedure for choosing rate x(£) 
codes for the channel M consisting of states of the form 
Uj, GG v; , which is the essence of the achievability 
proof for the unassisted classical capacity. 

This may seem like an unnecessarily cumbersome 
and even circular approach to the unassisted 
classical capacity given that the proof sketched 
above for the entanglement-assisted classical capa- 
city itself invokes the unassisted result in the form of 
the HSW theorem. The approach becomes more 
satisfying when one learns that simple and direct 
proofs of the father protocol exist that completely 
bypass the HSW theorem (Abeyesinghe et al. 2005). 

Thus, the entanglement-assisted communication 
protocols can be easily transformed into their 
unassisted analogs, confirming the central place of 
entanglement-assisted communication in quantum 
information theory. 
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Introduction 


Any processing of quantum information, be it 
storage or transfer, can be represented as a quantum 
channel: a completely positive and trace-preserving 
map that transforms states (density matrices) on the 
sender's end of the channel into states on the 
receiver's end. Very often, the channel S that sender 
and receiver (conventionally called Alice and Bob, 
respectively) would like to implement is not readily 
available, typically due to detrimental noise effects, 
limited technology, or insufficient funding. They 
may then try to simulate S with some other channel 
T, which they happen to have at their disposal. The 
quantum channel capacity O(T, S) of T with respect 
to S$ quantifies how well this simulation can be 
performed, in the limit of long input strings, so that 
Alice and Bob can take advantage of collective pre- 
and post-processing (cf. Figure 1). Higher capacities 
may result if Alice and Bob are allowed to use 
additional resources in the process, such as classical 
side channels or a bunch of maximally entangled 
pairs shared between them. 

Quantum capacity thus gives the ultimate bench- 
marks for the simulation of one quantum channel by 
another and for the optimal use of auxiliary 
resources. Together with the compression rate of a 
quantum source (see Source Coding in Quantum 


ü 
[s] 
r1 


Figure 1 Equipped with collective encoding and decoding 
operations (and perhaps some auxiliary resources), n=3 
instances of the channel T simulate m —2 instances of the 
channel S. The transmission rate of the above scheme is 2/3. 
Capacity is the largest such rate, in the limit of long messages 
and optimal encoding and decoding. 


Information Theory), it lies at the heart of quantum 
information theory. 

In a very typical scenario, Alice and Bob would 
like to implement the ideal (noiseless) quantum 
channel S=id: they are interested in sending 
quantum states undistorted over some distance, or 
want to store them safely for some period of time, so 
that all the precious quantum correlations are 
preserved. The capacity O(T) x O(T,id) is then the 
maximal number of qubit transmissions per use of 
the channel, taken in the limit of long messages and 
using collective encoding and decoding schemes 
asymptotically eliminating all transmission errors. 
This is what is generally called the quantum capacity 
of the channel T, and it is our main focus in this 
article. Little is known so far about the quantum 
capacity for the simulation of other (nonideal) 
channels (cf. the section “Related capacities"). 

In remarkable contrast to the classical setting, 
quantum channel capacities are very much affected 
by additional resources. This leads to unexpected 
and fascinating applications such as teleportation 
and dense coding. But it also results in a bewildering 
variety of inequivalent channel capacities, which still 
hold many challenges for future research. 


Notation 


A quantum channel which transforms input systems 
on a Hilbert space H4 into output systems on a 
(possibly different) Hilbert space Hg is represented 
(in Schrödinger picture) by a completely positive and 
trace-preserving linear map T:5,(714) —^ B.(?1g), 
where B,(^) denotes the space of trace class 
operators on the Hilbert space H (see Channels in 
Quantum Information Theory). We write .4 instead 
of B,(H.4) to streamline the presentation, and A” for 
the n-fold tensor product B, (1 4)*". 

It is evident that the definition of channel capacity 
requires the comparison of different quantum 
channels. A suitable distance measure is the norm 
of complete boundedness (or cb-norm, for short), 
denoted by ||- ||... For two channels T and S, the 
distance (1/2)||T — S||. can be defined as the largest 
difference between the overall probabilities in two 
statistical quantum experiments differing only by 
exchanging one use of S by one use of T. These 
experiments may involve entangling the systems on 
which the channels act with arbitrary further 
systems; hence the cb-norm remains a valid distance- 
measure if the given channel is only part of a larger 
system. Equivalently, we may set  ||T||.,:— 
sup, |T idal, where  |R]|:— supyy, <1 Ro) 


denotes the norm of linear operators, and 
lol, := tr/o*o is the trace norm on the space of 
trace-class operators B, (71). 

We use base two logarithms throughout, and we 
write ld x := log, x and exp; x := 2*. 


Quantum Channel Capacity 


The intuitive concept underlying quantum channel 
capacity is made rigorous in the following 
definition: 


Definition 1 A positive number R is called achiev- 
able rate for the quantum channel T:.A— B with 
respect to the quantum channel S: A’ — B' iff for any 
pair of integer sequences (7,),-; and (m,),-x with 
lim, 4; My = oo and lim, 4, 7- <R we have 


lim inf ||DT®”E — $e" — 0 [1] 
v—0o D,E 


the infimum taken over all encoding channels E and 
decoding channels D with suitable domain and 
range. The channel capacity O(T,S) of T with 
respect to S$ is defined to be the supremum of all 
achievable rates. The quantum capacity is the special 
case O(T):= O(T,id;), with id; being the ideal 
qubit channel. 


In this article, we mainly concentrate on 
channels between finite-dimensional systems. This 
is enough to bring out the basic ideas. Many of the 
concepts and results discussed here can be general- 
ized to Gaussian cbannels, which play a central 
role as building blocks for quantum optical 
communication lines (Holevo and Werner 2001, 
Eisert and Wolf). 

There is considerable freedom in the definition 
of quantum channel capacity, at least for ideal 
reference channels (Kretschmann and Werner 
2004). In particular, the encoding channels E in 
eqn [1] may always be restricted to isometric 
embeddings. 

In addition, it is not necessary to check an infinite 
number of pairs of sequences (z,),-4; and (m,), <x 
when testing a given rate R, as Definition 1 would 
suggest. Instead, it is enough to find one such pair 
which achieves the rate R_ infinitely often, 
lim, atts Hc KR. 

Without affecting the capacity, the cb-norm ||T ||. 
may be replaced by the unstabilized operator norm 
| T|| or by fidelity measures, which are in general 
much easier to compute. In particular, one might 
choose the minimum fidelity, 


F(T) = min (IT (Ie) (9D|v) 2] 
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or even the average fidelity, 


F(T) := } (IT (IW) HT) dy i3 


Unfortunately, this equivalence is restricted to 
capacities with noiseless reference channel $=id. 
In the vicinity of other (nonideal) channels, equiva- 
lence of the stabilized and unstabilized error criteria 
may be lost. Of course, the comparison of channels 
is ultimately based on the comparison of a state to 
its'image, and here the pure states are the worst 
case. Hence, the remarkable insensitivity of the 
quantum capacity to the choice of the error criterion 
stems from the observation that the comparison 
between an arbitrary state and a pure state is rather 
insensitive to the criterion used. 

Instead of requiring the error quantity in eqn [1] to 
approach zero in the large block limit v — oo, one 
might feel tempted to impose that the errors vanish 
completely for some sufficiently large block length, 
since this is the standard setup in the theory of 
quantum error correction (see Quantum Error Correc- 
tion and Fault Tolerance). While it is true that errors 
can always be assumed to vanish exponentially in eqn 
[1], requiring perfect correction may completely change 
the picture: if a channel has some small positive 
probability for depolarization, the same also holds for 
its tensor powers, and no such channel allows the 
perfect transmission of even one qubit. Hence, the 
capacity for perfect correction will vanish for such 
channels, while the standard capacity (in accordance 
with Definition 1) will be close to maximal, O(T) + 1. 
The existence of perfect error-correcting codes thus 
gives lower bounds on the channel capacity, but is not 
required for a positive transfer rate. 

In the other extreme, one might sometimes feel 
inclined to tolerate (small) finite errors in the 
transmission. For some c£ > 0, we define O.(T) 
exactly like the quantum capacity in Definition 1, 
but require only that the error quantity in eqn [1] 
falls below € for some sufficiently large v. 
Obviously, O-(T) > Q(T) for any quantum 
channel T. We also have lim-o Q.(T) — O(T) 
(Kretschmann and Werner 2004). In the classical 
setting, even a strong converse is known: if £ > 0 is 
small enough, one cannot achieve bigger rates by 
allowing small errors, that is, C. (T) — C(T). It is still 
undecided whether an analogous property holds for 
the quantum capacity O(T). 


Related Capacities 


This article is chiefly concerned with the quantum 
capacity of a quantum channel. A variety of other 
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capacities have been derived from Definition 1 by 
either amending the channel $ to be simulated, or 
allowing Alice and Bob to make use of additional 
resources. Their interrelations are reviewed in Bennett 
et al. (2004) 

Much interest has been devoted to the hybrid 
problem of transmitting classical information undis- 
torted over noisy quantum channels. The classical 
capacity C(T) of a quantum channel T is discussed in 
the article Quantum Channels: Classical Capacity of 
this Encyclopedia. It is obtained by choosing the ideal 
one-bit channel rather than the one-qubit channel as 
the standard of reference in Definition 1. Encoding 
channels E and decoding channels D are then 
restricted to preparations and measurements, respec- 
tively. Since a quantum channel can also be employed 
to send classical information, we have C(T) > O(T). 
There are, obviously, examples in which this 
inequality is strict: the entanglement-breaking channel 
T(o) 7 X; (lel) |/) | is composed of a measurement 
in the orthonormal basis {|j)};, followed by a prepara- 
tion of the corresponding basis states. It destroys all 
the entanglement between the sender and a reference 
system, implying O(T) — 0. Yet all the basis states |j) 
are transmitted undistorted, which is enough to 
guarantee that C(T) — 1. 

Definition 1 also applies to purely classical 
channels, and thus to the setting of Shannon's 
information theory. A classical channel T between 
two d-level systems is completely specified by the 
d x d matrix y NR of transition probabilities. 
For these channels the cb-norm difference is just 
(twice) the maximal error probability: 


lid — T| — 2sup,(1 — Ts] 


which is the standard error criterium for classical 
information transfer. 

Dense coding and teleportation suggest that 
entanglement is a powerful resource for information 
transfer. It doubles the classical channel capacity of 
a noiseless channel, and it allows to send quantum 
information over purely classical channels. Surpris- 
ingly, the entanglement-assisted capacities are often 
simpler and better behaved than their unassisted 
counterparts. Unlike the classical and quantum 
capacities proper, they are relatively easy to calcu- 
late using finite optimization procedures, and there 
has recently been significant progress in under- 
standing the simulation rates for nonideal channels 
in this scenario (see Capacities Enhanced by 
Entanglement). 

The quantum channel capacity is unaffected by 
entanglement-breaking side channels. In particular, 
classical forward communication alone cannot 


enhance it. However, unlike in the purely classical 
case, both the quantum and classical channel 
capacity (but not the entanglement-assisted capacity) 
may increase under classical feedback. 


Elementary Properties 


The capacity of a composite channel T, o T? cannot 
be bigger than the capacity of the channel with the 
smallest bandwidth. This in turn suggests that 
simulating a concatenated channel is in general easier 
than simulating any of the individual channels. These 
relations are known as bottleneck inequalities: 


O(TioT2,S) € min[O(Ti,S), O(T2,S)) [4 
O(T,S; 0S2) 2 max{ Q(T, S1), O(T,S2)} |5] 


Instead of running T, and T> in succession, we may 
also run them in parallel. In this case, the capacity 
can be shown to be superadditive, 


O(T; 8 T?,8) = O(T;,S) + O(T2, S) [6] 


For the standard ideal channels, we even have 
additivity. The same holds true if both $ and one 
of the channels Ti,T; are noiseless, the third 
channel being arbitrary. However, results on the 
activation of bound-entangled states seem to suggest 
that the inequality in eqn [6] may be strict for some 
channels (see Entanglement). 

Finally, the two-step coding inequality tells us that 
by using an intermediate channel in the coding 
process we cannot increase the transmission rate: 


O(Ti, T2)  O(T1, T3) O(T5, T2) [7] 


Applying eqn [7] twice with T,=id and T3;-id 
immediately yields upper and lower bounds on the 
channel capacity with nonideal reference channel, 


O(T1) 
O(T»2) 


The evaluation of the lower bound in eqn [8] then 
requires efficient protocols for simulating a noisy 
channel T» with a noiseless resource. 

There are special cases in which the quantum 
channel capacity can be evaluated relatively easily, 
the most relevant one being the noiseless channel id,,, 
where by the subscript n we denote the dimension of 
the underlying Hilbert space. In this case, we have 


TE ld z 
O(1d,,, id,,) d ld m 


The lower bound O(id,, idm) > ldn/ldm is immedi- 
ate from counting dimensions. To establish the 


upper bound, we use the fact that a noiseless 
quantum channel cannot simulate itself with a rate 


> O(T1,T2) > Q(T1) O(id, T2) — [8] 


[9] 


exceeding unity: O(id,,id,,)- 1. This is just the 
upper bound we want to prove for the special case 
n—m, and it can be extended to the general case 
with the help of the two-step coding inequality [7]: 
O(id,,, idn) Q(id,, idm) < Qlidm idm) <1, implying 
Q(id,, idm) € 1/O(id,, id,) € Id 1/ld m, where in the 
last step we have applied the lower bound with the 
roles of n and m interchanged. 

Combining eqn [9] with the two-step coding 
inequality [7], we see that for any channel T 


Q(T, idn) = ^ O(T, id) i10 


which shows that quantum channel capacities relative 
to noiseless channels of different dimensionality only 
differ by a constant factor. Fixing the dimensionality 
of the reference channel then only corresponds to a 
choice of units. Conventionally, the ideal qubit 
channel id; is chosen as a standard of reference, as 
in Definition 1 above, thereby fixing the unit “bit.” 

The upper bound on the capacity of ideal channels 
can also be obtained from a general upper bound on 
quantum capacities (Holevo and Werner 2001), 
which has the virtue of being easily calculated in 
many situations. It involves the transposition map, 
which we denote by O, defined as matrix transposi- 
tion with respect to some fixed orthonormal basis. 
The transposition is positive but not completely 
positive, and thus does not describe a physical 
channel (see Channels in Quantum Information 
Theory). We have ||O||,, =d for a d-level system. 
For any channel T and small s > 0, 


Q(T) € Q(T) <ld ||TO||4, =: Oe(T) [11] 


where OQ. is the finite error capacity introduced in 
the section “Quantum channel capacity.” 

The upper bound Oe(T) has some remarkable 
properties, which make it a capacity-like quantity in 
its own right. For example, it is exactly additive, 


Oe(S & T) = Oe(S) + Oe(T) [12] 


for any pair $,T of channels, and it satisfies 
the bottleneck inequality: 


Qoe(ST) € min(Oe(S), Qe(T)} 
Moreover, it coincides with the quantum capacity on 
ideal channels, Oo(id,) = O(id,,) = ld n, and it vanishes 
whenever TO is completely positive. In particular, if 
id & T maps any entangled state to a state with positive 
partial transpose, we have Oo(T) — 0. 


State-Channel Duality 


Quantum capacity is closely related to the distillable 
entanglement, which is the optimal rate m/n at 
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which 7 copies of a given bipartite quantum state o 
shared between Alice and Bob can be asymptotically 
converted into m maximally entangled qubit pairs 
(see Entanglement). Similar to the quantum capa- 
city, the definition involves the large block limit 
n,m — oo and an optimization over all conceivable 
distillation protocols. These may consist of several 
rounds of local quantum operations and (forward or 
two-way) classical communication. The one-way 
and two-way distillable entanglement of o will be 
denoted by Dı(o) and D»(o), respectively. 

Suppose that Alice and Bob are connected by a 
quantum channel T and run such a one-way distilla- 
tion protocol on (many copies of) the state 
or := (T @id)|QXQ|, where |) :— (1/ /d4) X; li, 1) 
is maximally entangled on H4 & H y. If the distillation 
yields maximally entangled qubits at positive rate R, 
Alice may apply the standard teleportation scheme to 
send arbitrary quantum states to Bob undistorted at 
that same rate R. Like the distillation protocol itself, 
teleportation requires classical forward communica- 
tion, which however does not affect the channel 
capacity (cf. the section *Related capacities"). Thus, 
O(T) > Di(or). If two-way distillation is allowed, we 
have O5(T) > D2(or) for the capacity O2(T) assisted 
by two-way classical side communication. 

Conversely, if Alice and Bob use a bipartite 
quantum state o shared between them as a substitute 
for the maximally entangled state |Q) in the 
standard teleportation protocol, they will implement 
some noisy quantum channel T,. If this channel 
allows to transfer quantum information at nonvan- 
ishing rate R, Alice may share maximally entangled 
states with Bob at that same rate R. Consequently, 
D4(o) > O(T,) and D2(e) > Q2(T>). 

These relations (Bennett et al. 1996) allow to 
bound channel capacities in terms of distillable 
entanglement and vice versa. If the two maps 
T= or and p T, are mutually inverse, we even 
have Di(e)= Q(T,) and D2(e)=Q2(T,). In this 
case, the duality ọ = T, is the physical implementa- 
tion of Jamiolkowski's isomorphism between bipar- 
tite states and channels (see Channels in Quantum 
Information Theory). This has been shown 
(Horodecki et al. 1999) to hold for isotropic states, 
which are invariant under the group of all U@U 
transformations, where U is the complex conjugate 
of the unitary U. The corresponding channels are 
partly depolarizing. 

In general, T,, ÆT. However, the so-called con- 
clusive teleportation allows us to implement T at 
least probabilistically, resulting in the relation 


ZAT)<Dilor)< QT) — [3] 
A 
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The duality [13] can be applied to show that both 
the unassisted and the two-way quantum capacities 
are continuous in any open set of channels 
having nonvanishing capacities (Horodecki and 
Nowakowski 2005). 


Coding Theorems 


Computing channel capacities straight from Defini- 
tion 1 is a tricky business. It involves optimization in 
systems of asymptotically many tensor factors, and 
can only be performed in special cases, like the 
noiseless channels in the section *Elementary prop- 
erties.” Coding theorems aspire to reduce this 
problem to an optimization over a low-dimensional 
space. They usually come in two parts: the converse 
provides an upper bound on the channel capacity 
(typically in terms of some entropic expression), 
while the direct part consists of a coding scheme 
that attains this bound. By Shannon's celebrated 
coding theorem, the classical capacity of a classical 
noisy channel can be obtained from a maximization 
of the mutual information over all joint input- 
output distributions. 

For the quantum channel capacity, the relevant 
entropic quantity is the coherent information, 


I(T, o) := H(T(o)) — H(T& id (|W) (wol)) [14] 


where H denotes the von Neumann entropy: 
H(o)= —tro ldo, and v, € Ha @Hy is a purifica- 
tion of the density operator o € A. The coherent 
information does not increase under quantum 
operations, [,(So T, o) X I((T, o) for any quantum 
channel S and state 9€ A. This is the data 
processing inequality (Barnum et al. 1998), which 
shows that the regularized coherent information 
provides an upper bound on the quantum channel 
capacity: if Alice and Bob have a coding scheme for 
the channel T with capacity O(T), » channel uses 
allow them to share a maximally entangled state of 
size ~ exp, n Q(T). The coherent information of this 
state equals ~n Q(T), and was no larger prior to 
Bob's decoding. 

Recently, Devetak (2005) developed a coding 
scheme to show that this bound is in fact attainable. 
Different proofs were outlined by Lloyd and Shor. 


Theorem 1 For every quantum channel T, 


O(T) = lim A nins RES" 5) [15] 


n—ooH 0 


Unlike the classical or quantum mutual information, 
coherent information is strictly superadditive for 
some channels (DiVincenzo et al. 1998). Hence, 


taking the limit n — oo in eqn [15] is indeed required, 
and in general the evaluation of the capacity formula 
[15] still demands the solution of asymptotically large 
variational problems. This should be contrasted with 
the entanglement-assisted capacities Cg(T) — 2Ozg(T) 
(where a simple nonregularized coding theorem is 
known to hold, see Capacities Enhanced by Entan- 
glement) and the capacity for classical information 
C(T) (where additivity is conjectured but not proved, 
see Quantum Channels: Classical Capacity). Even a 
maximization of the single-shot coherent information 
I.(T,o) appears to be a difficult optimization 
problem, since this quantity is neither convex nor 
concave and may have multiple local maxima (Shor 
2003). Thus, even for simple-looking systems like the 
qubit depolarizing channel, so far we only have upper 
and lower bounds on the quantum channel capacity, 
but do not yet know how to compute its exact value. 

We now sketch Devetak's proof of Theorem 1, 
assuming only some familiarity with Holevo- 
Schumacher—Westmoreland (HSW) random codes 
for the classical channel capacity (see Quantum 
Channels: Classical Capacity). It is easily seen from 
Stinespring's dilation theorem (see Channels in 
Quantum Information Theory) that a noiseless 
quantum channel provides perfect security against 
eavesdropping. This is one of the characteristic traits 
of quantum mechanics and lies at the heart of 
quantum cryptography. In his proof, Devetak 
showed a way to turn this around and upgrade 
coding schemes for private classical information to 
quantum channel codes. 

The relation between quantum information trans- 
fer over a channel T:.A— B and privacy against 
eavesdropping is best understood in terms of the 
companion channel Ts: A—€. Te arises from a 
given Stinespring isometry V:?14— Hp &'He of 
Tz Tg by interchanging the roles of the output 
system B and the environment €: 


Ts(o) = treVoV* = Tz(o) = trgVoV" [16] 


The channel Te describes the information flow into 
the environment £, a system we assume to be under 
complete control of a potential eavesdropper, Eve 
say. The setup for private classical information 
transfer (including the definition of rates and capa- 
city) is then exactly the same as for the classical 
channel capacity (see Quantum Channels: Classical 
Capacity), but the protocols now have to satisfy the 
additional requirement that Te releases (almost) no 
information to the environment. This can be achieved 
by randomizing over ve ~ exp, x(Te, (p;, oj) code 
words of a standard HSW code of total size 
~ exp» n X( Ts, {pi, 0;]), where (p;, oj] is the quantum 
ensemble from which a set of random code words 


Vp, VE 


(kp qui ÍS 
the Holevo bound 


generated. The appearance of 


X(T, ipi. oi]) =#( Sere) = 2 jbiH (T(o))) [17] 


in the dimension of both these code spaces can be 
understood from the size of the relevant typical 
subspaces (Devetak and Winter 2004). 

The randomization guarantees that the remaining 
vg ~ exp; n(x(Tg)— x(Te)) code words are almost 
indistinguishable to Eve: 


T LM | 
DAP (on —o1)|| Se Vi,k=1,...,ve [18] 
[21 


1 


The net transfer rate for private classical informa- 
tion is then R — x(Tg) — x(Tz), which is just the total 
transfer rate for the channel Alice — Bob reduced by 
the transfer rate Alice — Eve. 

Remarkably, if o= `; pi |v;)(vj| is a decomposi- 
tion of o € A into pure states, the private transfer 
rate exactly equals the coherent information, 


Ic (Tg, 0) = H(Tg(e)) — H(Tz(o)) 
= x(Ts) — x(Tz) [19] 


The so-called entropy exchange 


H(Tz(o)) = H(Ts & id(|i5) (wo)) 


quantifies the extent to which a formerly pure 
ancilla state becomes mixed via interaction with 
the signal states. Equation[19] then nicely reflects 
the intuition that for high-rate quantum information 
transfer the signal states should not entangle too 
much with the environment. In fact, for an almost 
noiseless channel the entropy exchange nearly 
vanishes, and the optimized coherent information 
almost attains the maximal value 1, while for nearly 
depolarizing channels we have I.(Tg, 0) ~ —H(o) € 0. 

So far, we have sketched a protocol for private 
classical information transfer.-Devetak's coberenti- 
fication allows to pass from the transmission of 
classical messages to the transmission of coherent 
superpositions. This technique has also been applied 
to obtain entanglement distillation protocols from 
secret key distillation, and offers a unified view on 
the secret classical resources and their quantum 
counterparts (Devetak and Winter 2004, Devetak 
et al. 2004). 

In order to transfer quantum information, Alice 
will only need to send one half of a maximally 
entangled state of dimensionality ~ 人 exp n Ie(Tg, o). 
As described in the previous section, teleportation 
then allows her to transfer arbitrary quantum states 
from a subspace of that size. 
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Given a set of pure state code words 
Vp, VE » . « . 
{lye} yy Of a private classical information 
protocol, for entanglement transfer Alice prepares 
the input state 


1 € ] < 
中 / —Á—À k (6) — 20 
| AA 75 2. )A Ji; 2a Pua [ | 


where .A' denotes a reference system that Alice keeps 
in her lab. On his share of the resulting output state 
|Y (gg Bob will then employ the corresponding 
measurement operators {My}; <; to implement the 
coherent measurement 


Vm | 9)8 := ENT. My) | RI) sn, 


which places the measurement outcomes into some 
reference system B; &$ B2. Any measurement which 
identifies the output with high probability only 
slightly disturbs the output state, and thus Bob's 
coherent measurement leaves the total system in an 
approximation of the state 


1 VBR Vs 


` Ik) x |k)s, ID) s, lei) Be [21] 


Bes ial b] 


e^) = 


in which Eve and Bob are still entangled. A 
completely depolarizing channel Tg would directly 
yield a factorized output state B&E here. Although 
the randomization in eqn [18] does not necessarily 
result in complete depolarization, there is a controlled 
unitary operation which Bob may apply to effectively 
decouple Eve’s system, resulting in the output state 
~ (1/ vg) $74 | RR) yp, @ £, which is the maximally 
entangled state of size vg ~ exp, n I.(Tp, 0) required 
for teleportation. The direct part of the capacity 
theorem then follows by applying the above coding 
scheme to large blocks and maximizing over (pure) 
input ensembles, concluding the proof. 

Devetak's proof of the coding theorem seems to 
indicate that the private classical capacity C,(T) 
equals the quantum capacity O(T) for every 
quantum channel T. However, for the coherentifica- 
tion protocol, we have restricted the private coding 
schemes to pure state input ensembles, and thus we 
can only conclude that O(T) € C,(T). The existence 
of bound-entangled states with positive one-way 
distillable secret key rate (Horodecki et al. 2005) 
implies that this inequality can be strict. A general 
procedure does exist to retrieve (almost) all the 
information from the output of a noisy quantum 
channel that releases (almost) no information to the 
environment. But this requires a stronger form of 
privacy than eqn [18]. 


430 Capacity for Quantum Information 


Quantum Channels with Memory 


This article has so far been restricted to memory- 
less quantum channels, in which successive chan- 
nel inputs are acted on independently. Messages of 
n symbols are then processed by the tensor 
product channel T°”, as in Definition 1 and 
illustrated in Figure 1. In many real-world applica- 
tions, the assumption of having uncorrelated noise 
cannot be justified, and memory effects need to be 
taken into account. For a quantum channel T with 
register input .4 and register output B, such effects 
are conveniently modeled (Bowen and Mancini 
2004) by introducing an additional memory 
system M, so that now T:.M &$.A— B.M is a 
completely positive and trace-preserving map with 
two input systems and two output systems. Long 
messages with n signal states will then be 
processed by the concatenated channel 
T,:.M &.A" —5 B" @M. In such a concatenation, 
the memory system is passed on from one channel 
application to the next, and thus introduces 
(classical or quantum) correlations between con- 
secutive register inputs. | 

Remarkably, this relatively simple model can be 
shown (Kretschmann and Werner 2005) to encom- 
pass every reasonable physical process: every sta- 
tionary channel S$: A~ — B* which turns an infinite 
string of input states (on the quasilocal algebra A™) 
into an infinite string of output states on B" and 
satisfies the causality constraint is in fact a con- 
catenated memory channel. Causality here means 
that the outputs of the stationary channel S at given 
time to do not depend on inputs at times ¢ > fo. 
Figure 2 illustrates the structure tbeorem for causal 
stationary quantum channels. In general, it produces 
not only the memory channel T with memory 
algebra M, but also a map R describing the 
influence of input states in the remote past. 
Intuitively, such a map is often not needed, because 
memory effects decrease in time: the memory 
channel T is called forgetful if outputs at a large 
time £ depend only weakly on the memory initializa- 
tion at time zero. In fact, memory effects can be 


Figure 2 By the structure theorem, a causal automaton S can 
be decomposed into a chain of concatenated memory channels 
T plus some input initializer A. Evaluation with the partial trace tr 
means that the corresponding output is ignored. 


shown to die out even exponentially. The set of 
these channels is open and dense in the set of 
quantum memory channels. Hence, generic memory 
channels are forgetful. 

The capacity of memory channels is defined in 
complete analogy to the memoryless case, replacing 
the z-fold tensor product T®” in Definition 1 by 
the n-fold concatenation T,. The coding theorems 
for (private) classical and quantum information 
can then be extended from the memoryless case 
to the very important class of forgetful channels 
(Kretschmann and Werner 2005). 

Nonforgetful channels call for universal coding 
schemes, which apply irrespective of the initializa- 
tion of the input memory. Such schemes are 
presently known only for very special cases. 
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Historical and Conceptual Background 


A capillary surface is the interface separating two 
fluids that lie adjacent to each other and do not mix. 
Examples of such surfaces are the upper surface of 
liquid partially filling a vertical cylinder (capillary 
tube), the surface of a liquid drop resting in 
equilibrium on a tabletop (sessile drop) and the 
surface of a liquid drop hanging from a ceiling 
(pendent drop); further instances are the surface of a 
falling raindrop, the bounding surface of the liquid 
in the fuel tank of a spaceship, and the interface 
formed by a fluid mass rotating within another fluid. 
This last example extends to the problem of rotating 
stars. 

Interfaces separating fluids and solids share some 
of the physical attributes of capillary surfaces, and 
the study of wetted portions of rigid “support 
surfaces” becomes essential for describing global 
behavior of capillary configurations. However, some 
significant distinctions appear that change the 
formal structure of the problems, and must be 
accounted for in the theory. 

Phenomena governed by capillarity pervade all of 
daily life, and most are so familiar as to escape 
special notice. By contrast, throughout the eigh- 
teenth century and presumably earlier, great atten- 
tion centered on the rise of liquid in a narrow glass 
circular-cylindrical tube dipped vertically into a 
liquid reservoir (Figure 1); this striking event had a 
dramatic impact that confounded intuition. Clarifi- 
cation of the behavior became one of the major 
problems challenging the scientific world of the 
time, and was not achieved during that period. The 
term “capillary,” adapted from the Latin “capillus” 
for hair, was applied to the phenomenon since it was 
observed only for tubes with very fine openings; the 
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Lil L Trc 
Figure 1 Capillary tube in infinite reservoir, in downward 
gravity field. 


more general usage adopted in the definition above 
derives from the recognition of a class of phenomena 
with a common physical basis. 

The first recorded observations concerning 
capillarity seem due to Aristoteles c. 350 ac. He 
wrote that *a broad flat body, even of heavy 
material, will float on water, however a narrow 
thin one such as a needle will always sink." Any 
reader with access to a needle and a glass of water 
will have little difficulty refuting the assertion. 
Remarkably, the error in reasoning seems not to 
have been pointed out for almost 2000 years, 
when Galileo addressed the problem in his 
Discorsi, about 1600. The only substantive studies 
till that time are apparently those of Leonardo da 
Vinci a hundred years earlier. Leonardo intro- 
duced reasoning close in spirit to that of current 
literature; however, the Calculus was not available 
to him, and he was not in a position to develop his 
ideas in quantitative ways. 


Young's Contribution 


The later discovery of the Calculus provided a 
driving impetus guiding many new studies during 
the eighteenth century. But despite the enormity of 
that weapon, it did not on its own suffice, and initial 
quantitative success had to await two initiatives 
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taken by Thomas Young in 1805. Young based his 
studies on the concept of surface tension that had 
been introduced by von Segner half a century earlier. 
Segner hypothesized that every curve on a fluid/fluid 
interface S experiences on both its sides an orthogo- 
nal force o per unit length, which (for given 
temperature) depends only on the materials and is 
directed into the tangent planes on the respective 
sides. The presence of such forces can be indicated 
by simple experiments. They become clearly evident 
in the case of thin (soap) films spanning a frame, in 
which case there is an easily observed orthogonal 
pull on the frame, see the section *Dual interpreta- 
tion of o: distinction between fluids and solids." 
Young made two basic conceptual contributions 
(Yl, Y2Y 


Y1. Relation of pressure jump across a free interface 
to mean curvature and surface tension. 


Consider a piece of surface § in the shape of a 
spherical bowl of radius R, separating two immisci- 
ble fluid media, as in Figure 2. In equilibrium, any 
pressure difference óp across $ must be balanced by 
a tension o on its rim T. If S projects to a disk of 
(small) radius r on the plane tangent to S at the 
symmetry point, we are led to 


nr 6p ~ 2rro sin à [1] 


where ¥ is inclination of S at the rim, relative to the 
plane. We thus find at the base point 

d sin Ü 1 

一 一 = 20— 2 
^ dr R 2) 
Young then went on to consider a general $, without 
symmetry hypothesis. Letting 1/R;,1/R2 denote the 
planar curvatures at a point in S of two normal 
sections in orthogonal directions, he asserted that 

1-/-1 1 


óp-2 


where H is the mean curvature of S at the point. 
Although Young provided no formal justification for 
this step, we can establish it with the aid of a general 
formula from differential geometry that was not 
known in his lifetime: - 


2HN d$ = $ nds 四 
S r 


Figure 2 Pressure change across fluid element, balanced by 
surface tension. 


where N is a unit normal on $, and n is unit 
conormal (as indicated in Figure 2) on T. Multi- 
plying both sides of [4] by o, the right-hand side 
becomes the net surface tension force on S. Since 
that must equal the net balancing pressure force, we 
obtain 


Í (&p — 20H)N dS = 0 [5] 
S 


Letting the diameter of S tend to zero, the assertion 
follows. 

We emphasize here the implicit assumption above, 
that o is a constant depending only on the particular 
materials, and not on the shape of S. This author 
knows of no source in which that is clearly 
established, although experiments and experience 
provide some a posteriori justification. See the 
further comments under Y2, and later in sections 
“Gauss’ contribution: the energy method" and 
“Dual interpretation of o: distinction between fluids 
and solids." 


Y2. The capillary contact angle. 


Young asserted that there are surface tensions for 
solid/fluid interfaces analogous to those just intro- 
duced, and again depending only on the materials. 
This assertion is erroneous, as was suggested in 
writings of Bikerman and of others, and more 
recently established in a definitive example by Finn. 
Using his premise, Young attempted to characterize 
the contact angle ^; made by the fluid surface with a 
rigid boundary, by requiring that the net tangential 
component of the three surface tension vectors 
vanish at the triple interface; this leads to the often 
employed but incorrect *Young diagram," see 
Figure 3, and the relation 


cos y = Un [6] 


Figure 3 Young diagram; balance of tangential forces. 
Residual normal force remains. 


for cosy in terms of the magnitudes of the three 
"surface tensions." Young concluded that the 
contact angle depends only on the materials, and 
in no other way on the conditions of the problem. 
This basic assertion is by a fortuitous accident 
correct, as follows from the contribution by 
Gauss described below; it underlies all modern 
theory. 

Using Y1 and Y2, Young produced the first 
verifiable prediction for the rise height uo in 
the circular capillary tube of Figure 1. He 
assumed the interface to be spherical, so that H 
is constant and a= cosy/H. He assumed vanish- 
ing outside pressure. According to classic laws of 
hydrostatics, óp = pguo =20oH by Y1, where p is 
fluid density; there follows the celebrated rela- 
tion, presented entirely in words in his 1805 
article: 

_ 4cos vy pg 


uo : ; K = — [7] 
Ka O 


Young scorned the mathematical method, and 
made a point of deriving and publishing his 
results on capillarity without use of any mathe- 
matical symbols. This personal idiosyncrasy 
causes his publications to be something of a 
challenge to read. 


The Laplace Contribution 


In 1806, Laplace published the first analytical expres- 
sion for the mean curvature of a surface u(x, y), and 
showed that the expression can be written as a 
divergence. He obtained the equation 


Vu 


一 一 [8] 
\/1+|vul 


Thus, if H is known from geometrical or physical 
considerations, as it is for the capillary tube in 
the example just considered, one finds a second- 
order (nonlinear) equation for the surface height 
of any solution as a graph. The equation is 
elliptic for any function z(x,y) inserted into the 
coefficients, however not uniformly so; the parti- 
cular nonuniformity leads to some striking and 
unusual behavior of its solutions, as we shall see. 
With the aid of [8], Laplace improved the Young 
estimate [7] to 


2. COS "y 1 2 [1— sin? 
Kd cosy 3\ cos? 


Both Young and Laplace proposed their for- 
mulas for *narrow tubes", but neither gave any 


div Tu = 2H, Tu= 
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quantitative indication of what “narrow” should 
signify. Note that whenever 0< y< «/2, [9] 
becomes negative when the nondimensional Bond 
Number B = ka? exceeds 8; since u is known to be 
positive in the indicated range for y, [9] provides 
no information in that case, whereas [7] is still of 
some value. Nevertheless, [9] is asymptotically 
exact and consists of the first two terms of the 
formal expansion in powers of a; that was first 
proved by D Siegel in 1980, almost 200 years 
following the discovery of the formulas. In 1968, 
P Concus extended the formal expansion for the 
height to the entire traverse 0 <r <a. F Brulois 
(1981) and independently E Miersemann (1994) 
proved the expansion to be asymptotic to every 
order. Explicit bounds for the rise height above 
and below, making quantitative the notion of 
"narrow," were obtained by Finn. 

Laplace supplied the first detailed mathematical 
investigations into the behavior of capillary surfaces, 
applying his ideas to many specific examples. His 
underlying motivation apparently derived at least 
partly from astronomical problems, and he pub- 
lished his contributions in two “Suppléments” to the 
tenth volume of his Mécanique Céleste. 


Gauss' Contribution: The Energy Method 


Young and Laplace both based their reasonings 
on force-balance arguments, which at best were 
unclear and at worst conceptually wrong. In 
1830, Gauss took up the problem anew from a 
variational point of view, using the Johann 
Bernoulli principle of virtual work. To do so, he 
attempted to characterize both surface energies 
and bulk fluid energies in terms of postulated 
particle attractions and repulsions. In an aston- 
ishing 30 pages, he essentially introduced founda- 
tions of modern potential theory, of measure 
theory, and of thermodynamics. He ended up 
with elaborate expressions that could not readily 
be applied, and which at least to some extent he 
did not use. He asserted, for example, that the 
bulk internal energy would be proportional to 
volume, which for an incompressible fluid is 
constant under admissible deformations, and on 
that basis he ignored the bulk energy term 
completely. His procedures then led him, in an 
independent and more convincing way, to the 
identical equation and boundary condition that 
had been produced by his predecessors. It must, 
of course, be remarked that his justification for 
ignoring the bulk energy term would not be 
correct for a compressible liquid (see the section 
"Compressibility"), and it is open to some 
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question for the central motivating problem of a 
capillary tube dipped into an infinite liquid bath, 
in which event there is no volume constraint. 

The material that follows is guided by the ideas of 
Gauss; however, I have found it advantageous to 
replace his elaborate hypotheses on particle attrac- 
tions and repulsions by a simpler phenomenological 
reasoning as to the nature of the energy terms to be 
expected. 

To fix ideas, we consider a semi-infinite cylinder 
of general section € and of homogeneous material, 
closed at the bottom, situated vertically in a down- 
ward gravity field g per unit mass, and partly filled 
with an incompressible liquid of density p covering 
the bottom (a more exact discussion, taking account 
of compressibility, is indicated below in the section 
“Compressibility”). We assume an equilibrium fluid 
configuration with the liquid bounded above by an 
ideally thin interface S:u(x,y) (see Figure 4). We 
distinguish the energy terms that occur: 


1. Surface energy. This is the energy required to 
create the surface interface S. We can characterize it 
by noting that fluid particles within or exterior to the 
liquid are attracted equally to neighboring particles in 
all directions; however, at the surface S there is a 
differential attraction, to particles of the exterior 
medium (such as air) above, or to the liquid below 
(see Figure 5). Thus, particles in the interface are 
pulled orthogonally to S. In general, for a liquid-gas 
interface, significant work will be done only on the 
liquid and those particles will be pulled toward the 
liquid; otherwise, the liquid would evaporate across 
the interface and disappear. The work done in that 
(infinitesimal) motion is proportional to the area of S, 
so that for the surface energy Es we obtain 


Es = o f V1 + |Vul7dx [10] 
Q 


Figure 4 Liquid in cylindrical capillary tube, of general section 2. 
Reproduced with permission from the American Institute of 
Aeronautics and Astronautics. 
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Figure 5 Attractions on a fluid element: (1) interior to the fluid; 
(2) on the surface interface. 


The constant o has the dimensions of force per unit 
length, and turns out to be the surface tension of the 
interface. We note from [10] its dual interpretation 
as areal energy density on S, arising from formation 
of that surface. This alternative interpretation lends 
conceptual support to the supposition that o is 
constant on §. See the section “Dual interpretation 
of o: distinction between fluids and solids.” 

Implicit in the above discussion are deep 
premises about the nature of the forces acting 
within the fluid. Essentially these forces must be 
perceptible only at infinitesimal distances, and 
grow rapidly with decreasing distance. Forces 
both of attraction and of repulsion must be 
present. The recognition of the need for such 
forces can be traced back to Newton. Quantita- 
tive postulates as to their precise nature were 
introduced by van der Waals in the late nine- 
teenth century, and the topic remains still in 
active study. Since these forces appear at mole- 
cular distance levels, their introduction leads 
inevitably to questions of statistical mechanics. 
Additionally, our discussion of work done in 
forming the surface implicitly assumes a compres- 
sible transition layer there, in conflict with our 
treatment of S as an ideally thin interface 
bounding an incompressible fluid. In these senses, 
it is striking that [10] ~ which is in accord with 
classical constructions — could be obtained via 
global qualitative postulates concerning a con- 
tinuum in static equilibrium, in which the specific 
nature of the forces is not introduced. 

Rayleigh measured the thickness of the surface 
interface between water and air to be of mole- 
cular size, thus providing experimental justifica- 
tion for the procedure adopted. 

2. Wetting energy. A similar discussion applies at 
the interface separating the liquid and solid at the 
cylinder walls; however, this time the net attraction 
can be in either direction, as particles from neither 
medium can migrate significantly into the other. For 
the wetting energy Ew, we write, with X the 
boundary of Q, 


Ew = -pof u ds [11] 
x 


We designate 5 as the relative adbesion coefficient of 
the liquid-gas-solid configuration. We assume that 
the cylinder walls are of homogeneous material, so 
that 8 will be constant. In general, 8 is a difference of 
factors that apply on the walls at the two interfaces, 
with the liquid and with the external medium. 

3. Gravitational energy. The work done in 
lifting an amount of liquid oóbóQ against the 
gravity field from the base level to a height 5 in a 
vertical tube of small section 62 is pghdbéQ. Thus, 
the work done in filling that tube up to the 
surface height u is (pgu^/2)óQ, and the total 
gravitational energy is 


Ec = f u? dx [12] 
2 Jo 
4. Volume constraint. In the configuration con- 
sidered the volume is to be unvaried during 
admissible deformations; we take account of the 
constraint by introducing a Lagrange parameter À, 
and an additional “energy” term 


Ey =o | udx [13] 
0 


According to the principle of virtual work, the 
sum E of the above energies must remain unvaried 
in any deformation that respects all mechanical 
constraints other than the volume constraint. We 
choose a deformation u — u + en, with 7 smooth in 
the closure of Q, which determines a functional E(e). 
From E'(0) —0 follows 


| or ie I 2 dx 
" 1 + |Vul? 


- 8d nds =0 [14] 
»Y 


from which 
[it — div Tu + (ku + A) ) dx 
Q 


T $ n(v - Tu — 8)ds = 0 [15] 


with Tu = Vu/\/1+|Vul’, and with v the unit 
exterior normal on X. Choosing first 7 to have 
compact support in Q, the boundary term vanishes, 
and the “fundamental lemma” of the calculus of 
variations yields 


div Tu = ku + A, K = pg/o [16] 


throughout Q. Thus, the area integral in [15] 
vanishes for any 7. We are therefore free to choose 
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7 as we wish on the boundary, and the fundamental 
lemma now yields v- Tu — on X. We now note 
that for any liquid surface u(x, y) there holds 


v: Tu = cosy [17] 


on X, where y is the angle between the cylinder wall 
and the surface $, measured within the liquid. Since 
B is assumed to be constant, that is so also for ~y. It is 
a physical constant: the contact angle, that must be 
measured in an independent experiment, and cannot 
be prescribed in advance or calculated within the 
scope of the theory. 

The constant 5, originally introduced as a general 
proportionality constant, is now characterized as 
B= cos y. We thus see that a physical surface of the 
form envisaged is possible only if —1 «€ 5 € 1. 
Physically, one expects that if 9 « —1 the liquid 
will separate from the walls, while, if 8 > 1, the 
liquid will spread over the walls as a thin film. 

Equation [16] and boundary condition [17] 
provide a nonlinear second-order equation that is 
elliptic for any function u(x,y), and also a non- 
linear transversality condition on the boundary, for 
determining the surface interface S. The expression 
div Tu is exactly twice the mean curvature of the 
surface S. If & 40 then A can be eliminated by 
addition of a constant to u. The problem [16]-[17] 
for the fluid in a vertical cylindrical capillary tube 
of general section becomes thus a geometrical one: 
to find a surface whose mean curvature is a 
prescribed function of position in space, and 
which meets the cylindrical boundary walls in a 
prescribed angle y. 

In the absence of gravity, [16] takes the form 


div Tu — 2H [18] 


for a surface of constant mean curvature H. The 
constant H is determined by integrating [18] over Q, 
and using [17]: 


[19] 


where |X| and |O| denote the respective perimeter 
and area, and thus H is independent of volume. 
From the known uniqueness up to an additive 
constant of the solutions of [18], [17] it follows 
that the shape of the solution surface is indepen- 
dent of volume. That result holds also for [16], [17] 
in view of the possibility to eliminate A from the 
equation by addition of a constant, and the 
uniqueness of the solutions of the resulting 
equation. 

Equations [16|-[17] or [18]-[17] are appropriate 
for determining capillary surfaces that are graphs 
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u(x,y) over a base domain €. More generally, any 
surface $ in 3-space satisfies the equation 


Ax = 2HN [20] 


where H is its scalar mean curvature and N is a unit 
normal vector on $. Here A is the “intrinsic 
Laplacian” in the metric of S. This is the appropriate 
relation to be applied in situations for which the 
physical surface folds over itself and cannot be 
expressed globally as a graph. The formal simplicity 
of [20] is deceptive; the challenges arising from the 
nonlinearity in the equation can be formidable, and 
very little general theory is as yet available. 


Dual Interpretation of o: Distinction between 
Fluids and Solids 


We have already remarked the duality in connection 
with eqn [10] above. It can be made explicit with a 
simple experiment proposed by Dupré. One makes a 
rigid frame with a sliding bar of length /, as in 
Figure 6, and dips the frame into soap solution. On 
lifting the frame from the solution the opening will 
be filled with a soap film, and one finds a force 
F — 2ol on the bar, directed orthogonal to the bar 
(the factor 2 appears since the film has two sides). 
The work done in sliding the bar a distance óx is 
óF — 2olóx, which can also be written óF— 206A 
with 6A an element of area. In this sense, the two 
interpretations of o are formally equivalent, for 
fluid/fluid interfaces. 

The equivalence cannot be extended to solid/fluid 
interfaces. Consider a rigid spherical ball of generic 
material and radius R, freely floating in an infinite 
liquid bath in a gravity-free environment, see 
Figure 7a. It can be shown that the unique 
symmetric solution to the problem is a horizontal 
surface, as in the figure. A variational procedure as 
above shows that if eo,elyez are the interfacial 
energy densities associated with the three interfaces, 
then 


COS y = 一 一 一 [21] 
in formal analogy with the Young relation [6]. But 


€1,€? cannot be interpreted as interfacial forces 
whose net tangential component cancels that of ey. 


Figure 6 Dupré apparatus for exhibiting surface tension. 


01 


00 


Oy ON 


(a) (b) 


Figure 7 (a) Floating spherical ball; presumed "Young" forces. 
(b) Normal and vertical components of Young forces; contra- 
diction to presumed equilibrium. 


To do so would lead to a net downward force c, on 
the ball (see Figure 7b), contradicting the supposed 
equilibrium state. 


Mathematical and Physical Predictions: 
Experiments 


In the following sections, we study the kinds of 
behavior imposed on a surface S by the requirement 
that it appear as solution of one of the indicated 
equations and boundary conditions. Some of these 
properties are quite surprising in the context of 
classically expected behavior of solutions of equa- 
tions of mathematical physics. The mathematical 
predictions were, however, corroborated in certain 
cases experimentally, as we discuss below. 


Uniqueness and Nonuniqueness 


We begin by considering uniqueness questions. We 
start with a semi-infinite capillary tube, closed at the 
bottom, to be partially filled with a prescribed 
volume of (incompressible) liquid making contact 
angle y on the container walls (Figure 8a). If « > 0, 
any solution is uniquely determined. That is a quite 
general theorem, valid for a wide class of domains Q 
including all piecewise smooth domains (at the 
corners of which data of the form [17] cannot be 
prescribed); formally, data can be omitted on any 
boundary set of linear Hausdorff measure zero. In 
this result, no growth conditions need be imposed 
near the boundary (note that such a statement 
would be false for solutions of the Laplace equation 
under Dirichlet boundary conditions). 

Next we consider a sessile liquid drop on a 
horizontal plate (Figure 8b). Again the solution is 
uniquely determined by the volume and by 4, 
although the known proof differs greatly from that 
of the other case. 

We now consider a smooth deformation of the 
base plane, depending on a parameter t, which 
carries it into the cylinder; that can be done in such 
a way that the supporting surface is at all times 
“bowl-shaped,” as in Figure 8c. Since the bowl 
formation tends to restrict the possible deformations 


(a) (b) (c) 


(d) 


Figure 8 Support configurations: (a) capillary tube, general 
section; (b) horizontal plate; (c) convex surface appearing during 
deformation of horizontal plate to capillary tube; and (d) 
Nonuniqueness of configuration appearing during convex defor- 
mation. Reproduced from Mathematics Intelligencer 24(3) 2002 
21-33 with permission from Springer-Verlag Heidelberg. 


of the fluid consistent with smooth contact with the 
supporting rigid surface, one might expect that 
the corresponding capillary surface S(t), arising 
from the identical fluid mass, will for each + be 
uniquely determined. 

That is however not true, even for symmetric 
configurations. We can see that from the configuration 
of Figure 8d, consisting of a vertical circular cylinder 
whose base is a 45? cone. We assume a contact angle 
y=45° and adjust the radius so that a horizontal 
surface lying just below the cylinder/cone juncture 
provides the prescribed volume. This is a formal 
solution surface. Now fill the configuration with a 
larger volume, so that the contact line will lie above the 
juncture. The upper surface will no longer be flat, in 
view of the 45° contact angle, and takes an appearance 
as indicated in the figure. Finally, we decrease the fluid 
volume, keeping all other parameters unchanged. As 
noted above, the upper surface moves rigidly down- 
ward, and it is clear that if the original surface is close 
enough to the juncture line, then the prescribed volume 
will be attained before the contact line reaches the 
juncture. Thus, uniqueness fails. 

In this construction as just described, the bounding 
surface is not smooth; however, one sees easily that 
the procedure continues to work if the edge and 
vertex are smoothed locally. In fact, one can carry the 
procedure to a striking conclusion; by appropriate 
smoothing, one can construct a bounding surface 
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admitting an entire continuum of distinct solution 
interfaces, all with the same contact angle and 
enclosing the same fluid volume (Gulliver and 
Hildebrandt; Finn). This can be done for any gravity 
field. Figure 9 illustrates seven members of the family 
of interfaces, in the particular case «= 0. 

The question immediately arises as to which if 
any of the continuum of surfaces will be seen in 
an experiment. In fact, it can be proved that none 
of the indicated surfaces is mechanically stable 
(Finn, Concus and Finn, Wente). Since the indicated 
family includes all symmetric surfaces that are 
stationary for the energy functional, we find that 
any stable stationary configuration must be asym- 
metric. Thus, we have obtained an example of 
symmetry breaking, in which all conditions of the 
problem are symmetric, but for which all physically 
acceptable solutions are asymmetric. 

These results were subjected to computational test 
by M Callahan using the Surface Evolver software, 
to experimental test by M Weislogel in a drop 
tower, and to experimental test by S Lucid in the 
Mir Space Station. The results of the latter experi- 
ment are compared in Figure 10 with the computer 
calculations. In both cases, both a local minimizer 
(potato chip) and a presumed global minimizer 
(spoon) were observed. 

The seven surface interfaces indicated in Figure 9 
all provide the same sum of surface and wetting 
energy, and bound the same volume of fluid. They 
all satisfy an eqn [18] with constant H, in 
accordance with hypotheses of incompressibility 
and vanishing gravity. Thus, formally, all configura- 
tions have identical mechanical energy. The surfaces 


Figure 9 Seven spherical capillary interfaces in an "exotic" 
container of homogeneous material in zero gravity. All interfaces 
bound the same volume and have the same sum of free surface 
and wetting energies. If all pressures above the interfaces are the 
same, then the pressures below them successively increase as the 
curvature vectors of the vertical sections change from upwardly to 
downwardly directed. Reproduced from Mathematics Intelligences 
24(3) 2002 21—33 with permission from Springer-Verlag Heidelberg. 
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Figure 10 Symmetry breaking in exotic container, g = 0. Below: 
calculated presumed global minimizer (spoon) and local minimizer 
(potato chip). Above: experiment on Mir: symmetric insertion of fluid 
(center); spoon (left); potato chip (right). This is a grayscale version 
of a color figure reproduced from Journal of Fluid Mechanics, 224: 
383-94, (1991) with permission of Cambridge University Press. 


are all spherical caps; however, the radii R of the 
caps vary considerably. According to Y1 above, the 
pressure change across each interface is Ap =20/R. 
Since one may assume the outer region to be a 
vacuum with zero pressure for all caps, we find that 
the pressures within the fluids vary greatly among 
the configurations. One would thus expect that 
work is done within the fluid in passing from one 
configuration to another, a circumstance we have 
excluded by hypothesis when determining the 
family. From this point of view, the (customary) 
hypothesis of incompressibility that was used in 
determining the family is put into significant ques- 
tion; we examine this point in some detail in the 
section *Compressibility." 


Discontinuous Dependence | 


Capillary surfaces can exhibit striking discontinuous 
dependence on the defining data. As initial example, 
we consider the behavior of a solution of [18]-[17] 
at a protruding corner point P of the domain Q of 
definition. For simplicity, we assume the corner 
bounded locally by straight segments, meeting in an 
opening angle 2a « 7, thus forming locally a wedge 
domain. In anticipation of material to follow, we 
assume contact angles yı and y2 on the respective 
sides, 0 € 44,72 € 7. One can show that a necessary 
condition for a solution surface over a domain €); as 
in Figure 11 to have a continuous normal vector up 
to P is that the data point (7,2) lie in the closure of 
the rectangle R of Figure 12. (This figure includes 


Figure 11 Wedge domain. Reproduced from Finn R “Capillary 
Surface Interfaces" in Notices of AMS 46 No.7 (1999) with 
permission of the American Mathematical Society. 
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Figure 12 Domain H of data yielding continuous normal to 
capillary surface in wedge of opening 2a < æ. The symbols D 
and | are clarified in the section "Behavior at a corner point." 
Reproduced from "Capillary Wedges Revisited" in S/AM J. Math. 
Anal. 27 No.1 (1996) 56-69 with permission from SIAM. 


also additional material anticipating the section 
*Drops in wedges"). 

For data points interior to R, this criterion also 
suffices for the existence of at least one such solution 
surface, for any prescribed H; such surfaces can in 
fact be produced explicitly as spherical caps (planes 
if H — 0). It remains to discuss what can occur with 
data arising from the remaining four subregions of 
the square. 

If (51,52) € Dj, then there is no solution to 
[18]-[17] in any neighborhood of the corner point 
P. On the other hand, an explicit solution for any 
H >0 can be found as a lower spherical cap on 
the segment yı +% =n — 2a that separates Dj 
from R (see Figure 13, which indicates the 
equatorial circle). Correspondingly, if H < 0 then 
an explicit solution can be found on the separation 
line between Dj and R. Thus, there is a 


Figure 13 Construction of solution as lower hemisphere; y4 + 
yo =m — 2a,H > 0. Reproduced from “Capillary Wedges Revis- 
ited” in SIAM J. Math. Anal. 27 No.1 (1996) 56-69 with 
permission from SIAM. 


discontinuous change in behavior in crossing from 
R to either of the D, regions. 

This behavior was put to experimental test by 
W Masica, who considered the case 0 < 41 = %2 = 
y € 1/2 near the crossing point ^ —^ with Dj for 
which œ + Y,=7/2. He partially filled a regular 
hexagonal cylinder of acrylic plastic, successively 
with two different liquids, making respective contact 
angles greater or less than Yer with the plastic. For 
each liquid, Masica then allowed the cylinder to fall 
in a 132m drop tower. Figure 14 compares the two 
configurations after about 5 s of free fall. In the case 
42^ he obtained the spherical-cap solution, 
which in this case covers the entire base domain €) 
and appears as an explicit solution of [18]-[17]. 
When y< yr, the liquid rose to the top of the 
cylinder near the edges, filling out the edges over the 
corner points. The surface interface $ does not cover 
Q, but instead folds back over itself, doubly covering 
a portion of Q. Thus, a physical surface appears as it 
must, but it is not a solution of [18] over 2. 


Discontinuous Dependence Il 


About 1970, M Miranda raised informally the 
question, whether a capillary tube Zo, whose section 


(a) (b) 


Figure 14 Liquid in hexagonal cylinder, during free fall in drop 
tower: (a) æ + y > 7/2; (b) a +y < 2/2. 
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Qo lies strictly interior to a section Q4 of a tube Zi, 
will raise liquid from an infinite reservoir in a 
downward directed gravity field to a higher level 
over fo than will Z; over that subdomain of its 
section. That is true if both cylinders are circular, 
and in the intervening years its correctness was 
established in a number of other cases of particular 
interest. 

Finn and Kosmodem’yanskii, Jr. showed, how- 
ever, by example that the assertion fails in a large 
rarige of cases, and in fact can fail with arbitrarily 
large height differences, uniformly over Qo. Beyond 
that, the construction exhibits a strikingly discontin- 
uous change of behavior, under perturbations of a 
disk as inner domain. Perhaps more remarkably, the 
assertion can hold with the inner domain a disk, but 
with discontinuous reversal of behavior as the disk is 
perturbed to neighboring disks. That was shown in a 
form of the example given later by Finn, and 
illustrated in Figure 15. Here the outer domain Qı 
is polygonal, with sides that extend to be tangent to 
a unit disk Qo, as indicated. The angle ~y is to be 
chosen so that 0 € 7/2 — Y € agis, where amin is the 
smallest of the interior vertex half-angles of Q4. In 
view of the assumed infinite fluid reservoir, there is 
no volume constraint, and the governing equation 
[16] takes the form 


div Tu — ku, k = pg/a > 0 [22] 


Taking at first the inner domain to be Qo, it can 
be shown that for the corresponding solutions a? 
and u! of [22], there holds u° >u! over Qo for 


Figure 15 Discontinuous reversal of limiting height behavior. All 
sides of the polygonal domain €), are tangent to the unit disk Qo. 
For the corresponding solution heights u? in Qo, u° in the disk 9- 
of radius 1 +g, and vu! in 9;, there holds u' = u? < 0, for any 
downward gravity. But lim, .o(u* — u^) = +00, for any £ > 0. 
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any « > 0, and thus the Miranda question has a 
positive answer for that configuration. But if we 
replace Qo by a concentric disk Q- C Qı of radius 
1 十 s we find 


2€ cosy 
fu! (x: k) 一 E [ngs = 
fi u (x; kK) p^ em| 
» LÀ aos Ld 
1 MON. (+e) sin y 23 
COS ^ COS y 


where w= arccos(cos y/ sin o), and zx is the solution 
of [22], [17] in Q+. Since x does not appear on the 
right side of [23], there follows in particular that for 
any £ > 0, there holds 


€ 


im iho (x;&) — sup u‘ eo | =00 PA 
K 一 0 | Q Q 


In particular, a negative answer to Miranda’s 
question appears for all gravity sufficiently small. 
But as observed above, a positive answer occurs in 
Qo, for any positive gravity. Thus, the limiting 
behavior as x — 0 changes discontinuously, as € — 0. 
We find that the two limiting procedures cannot be 
interchanged: for any x € Qo, we obtain 


lim lim [u (x;&) — w^ (x;k)) = +00. 


E 一 0 &—0 [25] 
lim lim(u ( (xik) — u^ (x;«)} = const. < 0 


KUt 


Existence Questions | 


For the general equation [20] there is an established 
literature on existence of surfaces containing a 
prescribed space curve. There is very little literature 
relating to the capillarity boundary condition that 
the solution surface § meet a prescribed “support” 
surface W in a prescribed angle y. The existence of 
at least one such surface interior to a prescribed 
sufficiently smooth closed space domain was proved 
by Almgren, and then Taylor proved smoothness at 
the contact curve. These are abstract theorems that 
are basic for the theory but in general do not 
provide specific information in particular cases of 
interest. ] 

Special interest attaches to the nonparametric 
cases [16] or [18] with boundary condition [17], 
especially in view of the discontinuous behavior 
properties described above. These cases were studied 
in depth by a number of authors, with results that 
put the above examples into some perspective. 

M Emmer proved the existence of a unique 
solution of [16]-[17] for any compact € having 
Lipschitz boundary with Lipschitz constant L such 
that V1 + L? cosy < 1 — e€ for some € > 0. Finn and 


Gerhardt (F and G) extended this condition, and 
showed in particular that solutions exist in general 
in piecewise smooth Q. This result contrasts with the 
zero-gravity case [18] discussed in the section 
“Existence questions IL" for which solutions fail to 
exist when V1 + L? cosy > 1 at a protruding corner 
(see the section *Discontinuous dependence I"). 
However, in the cases v1-- L? cos» » 1 studied 
by F and G the solution z(x) is necessarily 
unbounded in the corner. This condition is equiva- 
lent to a < |y — 7/2| at the corner. Concus and Finn 
showed that if o > |y — 7/2| in a neighborhood Qg 
of a corner with rectilinear sides, as indicated in 
Figure 11, then the solution z(x) satisfies 


2 
lu(xin)| < — 4-6 [26] 


independent of o,^ in the range considered. Here it 
is assumed that [16] is normalized so that \=0; 
when « Æ 0 this can always be achieved by adding a 
constant to u. On the other hand, if a < |y — 7/2], 
then 


Vk2 — sin? 


an a 


u(xik) e = [27] 

where k= sin o/ cos» and 9 is polar angle relative 

to a bisector at the vertex; hence u becomes 

unbounded as O(1/r). Thus, the behavior changes 

discontinuously as the configuration for which 
= |y — 7/2| is crossed. 

This prediction was corroborated by T Coburn in 
a “kitchen sink” experiment in the Medical School 
at Stanford University. Coburn formed a wedge 
using two sheets of acrylic plastic, resting on a glass 
plate, and inserted a drop of distilled water at the 
base of the wedge. Initially, the wedge was opened 
sufficiently that o + y > 7/2, and he obtained the 
configuration of Figure 16a, with the maximum 
height slightly lower than that indicated by [26]. By 
closing down the angle slightly, the liquid rose to 
over ten times that height, as shown in Figure 16b. 
This experiment was later repeated by Weislogel 
under laboratory conditions; it incidentally estab- 
lishes the contact angle of water and acrylic plastic 
in the Earth's atmosphere as 80° + 2°. 

The indicated procedure provides in general a 
very accurate way to measure contact angles, when 
the angle is not far from 7/2. For y near zero or 7 in 
the Earth's gravity field, the discontinuity is con- 
fined to a microscopic neighborhood of the vertex, 
and can be difficult to observe. This technical 
difficulty was addressed by Fischer and Finn, who 
introduced "canonical proboscis" domains, the 
theory of which was further developed by Finn and 


(a) (b) 
Figure 16  Distilled water in wedges formed by acrylic plastic 
plates; g > 0. (a) æ +y » z/2; (b) a+y< 2/2. Reproduced 
from P Concus and R Finn, "On Capillary Free Surfaces in a 
Gravitational Field" in Acta Math 132 (1974) 207-223 with 
permission of Institut Mittag-Loeffler. 


Leise and by Finn and Marek. For such domains the 
change in behavior is not strictly discontinuous, but 
it is nearly so, and it extends over large portions of 
the cylinder section, so that it is easily observable. 
Concus, Finn, and Weislogel conducted space 
experiments, demonstrating the feasibility of the 
method as a means for measuring contact angles in 
general ranges. 

In [26]-[|27] no growth conditions at the corner 
are imposed; the estimates hold for every solution 
defined in €; and assuming the prescribed data on 
the side walls, with no data prescribed at the vertex. 
The formula [27] is the initial term of a formal 
asymptotic expansion of the solution, in powers of r. 
Miersemann obtained the complete expansion, 
asymptotic to every order, when a < y — 7/2|. He 
obtained somewhat less complete information in the 
bounded case [26]. 

Chen, Finn, and Miersemann provided a form of 
[27] that is applicable for any data (7,72) on the 
respective sides of the wedge, that arise from the D} 
regions of Figure 12. Lancaster and Siegel and 
independently Chen, Finn, and Miersemann showed 
that if 2a € yı +72 — 7 € 2a, then every solution 
is bounded at the vertex. This result holds also for 
the zero gravity eqn [18]. 

In the case of [18], Concus and Finn showed that 
in the Di regions no solution exists, regardless of H. 
Again, this result holds without growth conditions. 

From these considerations and from remarks in 
the section “Discontinuous dependence I” follows 
that for data in D7, all solutions either of [18] or of 
[16] are bounded but have discontinuous derivatives 
at the vertex P. Extrapolating from the behavior of 
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particular computed solutions, Concus and Finn 
conjectured that all solutions of [18] or of [16] that 
arise from data in D3 are discontinuous at P. A 
number of attempts to prove or to disprove this 
conjecture have till now been unsuccessful. 

An existence theorem for [16]-[17] alternative to 
that of Emmer was obtained independently by 
Ural’tseva, using a very different approach. This 
procedure yielded smoothness estimates up to the 
boundary, but required a hypothesis of boundary 
smoothness, so that the result does not mesh with the 
discontinuous dependence behavior as does that of 
Emmer. Later versions of the existence result, again 
under boundary smoothness requirements, were given 
by Gerhardt, Spruck, and Simon and Spruck. In the 
procedure introduced by Emmer, the boundary trace is 
shown to exist only in a very weak sense (which, 
however, suffices for a uniqueness proof). The later 
work can be adapted to show that the Emmer 
solutions are smooth on the smooth parts of 0€). 

None of the above procedures provides existence for 
the zero gravity case [18]. As we shall see in the 
following section, that is not an accident of the 
methods, but reflects subtle properties of the equations. 


Existence Questions Il 


We consider here the zero-gravity case [18], over a 
domain €) bounded by a piecewise smooth curve X, 
under the boundary condition [17]. Integrating [18] 
over Q and using [17], we find 2H|Q| = |X| cos. Let 
OF CR, E 23n0Q"*,T-O000Q*. The same proce- 
dure over O*, using that |Tu| < 1 for any u(x, y), 
leads to the bound 


Sr; y] > 0 [28] 
where ® is defined by 
$[D;4| = |F| — |X*| cosy + 2H|Q"*| [29] 


The inequality [28] must hold for any choice of 
Q* CQ. This provides a necessary condition for 
existence of a solution to [18]-[17] in Q. E Giusti 
showed that when Q* is interpreted in a generalized 
sense as a Caccioppoli set, the condition [28] 
becomes also sufficient for existence. 

It is easy to give specific examples of convex 
analytic domains Q, in which subdomains 1* can be 
found such that [28] fails. Thus, the general 
existence results for [16] do not carry over to [18], 
regardless of local domain smoothness. Neverthe- 
less, in many cases of interest (e.g., a circular disk or 
an ellipse that is not too eccentric), solutions of 
[18]-[17] do exist for any »y and are well behaved. 
Finn investigated the condition [28] in general by 
showing the existence of a system of arcs {T} C2 
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X 


Figure 17 [Extremal configuration for the functional ®. 


that minimize ®. All such arcs are circular of radius 
1/2H, and meet X either at smooth points in an 
angle y, or else at a reentrant corner point in an 
angle y* > y, measured on the side of T opposite to 
that into which the curvature vector points 
(Figure 17). All minimizing configurations are 
bounded by arcs of that form, although not all 
such configurations minimize. In a typical situation 
one will encounter only a finite number of such arcs, 
in which case only a finite number of cases need be 
examined. If ®>0 in each such case, then a 
solution of [18]-[17] exists for the given Q and y. 
It may occur that no such arcs exist; we then observe 
that since 0[0;,] — ?[X;,]— 0, $ cannot become 
nonpositive for any ()* C Q unless a minimizing T 
can be found in Q, contradicting the assumed 
nonexistence of minimizers. Thus, the criterion is 
then vacuously satisfied, and we conclude that a 
solution of [18]-[17] exists. 

One has, of course, to ask what happens 
physically in cases for which ®[T;”y] < 0 for some 
DL as above. The possible modes of behavior were 
studied in particular cases by Tam and later by 
deLazzer, Langbein, Dreyer, and Rath; Finn and 
Neel characterized the general case. Formally, the 
fluid rises to infinity throughout domains 1* of the 
form indicated, but with H replaced by a value 
H- < H; on the opposite side of the circular arcs I, 
the fluid is asymptotic to the vertical cylinders over 
I’. In a physical situation, the fluid will rise to the 
top of the container in a nearly cylindrical region 
adjacent to a portion of the container walls, 
approximating the indicated behavior and partially 
wetting the top of the container. One sees that 
behavior in Figure 14b, in which the fluid fills out 
regions adjacent to the corners. An analogous 
configuration would still be observed if the corners 
were smoothed locally. If insufficient fluid is 
available, a portion of the base € could become 
unwetted. 


Behavior at a Corner Point 


Lancaster and Siegel (L and S) studied the behavior of 
the limits (which they designate by Ru) of bounded 
solutions of [16] or of [18] along radial segments 


tending to a corner point P of a domain Q. These limits 
can exhibit remarkable idiosyncratic behavior. For 
simplicity of exposition, we restrict ourselves here to 
rectilinear boundary segments at P, and assume 
constant boundary angles 7,72 40,7 on the two 
sides. L and S prove first that the limits Ru exist and 
vary continuously with direction of approach; then 
they show the existence of “fan” regions of directions 
adjacent to those of the sides, in which the limits are 
constant independent of direction, see Figure 18. They 
obtain that if the opening angle 2a at P satisfies 2a < 
7, then for data in the rectangle R of Figure 12 the fans 
overlap (see Figure 18a), so that the solution is 
necessarily continuous at P. For data in Dj, the 
solution decreases from the ^ side X to the y2 side X; 
(*D" behavior), subject to the Concus—Finn conjecture 
(see the section “Existence questions I"), with the 
reverse behavior (“IP”) in D5 . Concus and Finn showed 
that if 2a < 7 then in Dj there is no bounded solution 
of [16]-[17] or [18]-[17] as a graph. For [16]-[17], 
unbounded solutions do however exist for such data 
(see the section “Existence questions I"). 


(c) 

Figure 18 (a) Fan domains APA' and BPB' of constant limiting 
values; 2a < x so that the fans overlap when data are in R. (b) 
2a > m, case 1. Fans APA’ and BPB' of constant radial limits 
appear. Limiting value changes strictly monotonically as 
approach direction changes from A'P to B'P. (c) 2a > m; case 2. 
In addition to the two fans adjacent to the sides of the 
wedge, a half plane of constant radial limits appears. 


If 2o >a, then the fans do not overlap, and 
in fact continuity at P cannot in general be 
expected. Outside the indicated fan regions adja- 
cent to the wedge sides, the limit values either 
change strictly monotonically with angle of 
approach, as in Figure 18b, or else they do so 
except for approaches within a third, central fan, 
which covers a full half-space, and interior to 
which the limiting values again remain constant, 
see Figure 18c. L and S give an example under 
which that behavior actually occurs. Remarkably, 
in the example the prescribed data are the same on 
both boundary segments. The solution is never- 
theless discontinuous at P, with an interval in 
which the radial limit increases, another interval in 
which it decreases, two fans of constant limit 
adjacent to the sides, and a fan of breadth 7 in- 
between. 

General conditions for continuity at a reentrant 
corner (2a > 7) have not yet been established. L and 
S give a sufficient condition, depending on a 
hypothesis of symmetry. Since no such hypothesis 
is needed when 2a « 7, one might at first expect it 
to be superfluous. However, Shi and Finn showed 
that by introducing an asymmetric domain perturba- 
tion that in an asymptotic sense can be arbitrarily 
small, the solution can be made discontinuous at P. 
That can be done without affecting any other 
hypotheses of the L and S theorem. 

In as yet unpublished work, D Shi characterized 
all possible behaviors at a reentrant corner, subject 
to the validity of the Concus-Finn conjecture at a 
protruding corner. If « > 0 then all solutions of [16] 
or of [18] in a neighborhood of P in €) are bounded 
at P. The further behavior depends on the particular 
data, and is indicated in Figure 19. Note the analogy 
with Figure 12, although the interpretations in the 
figures differ in detail. Here the symbol I denotes 
strictly increasing from the side X, to X», except on 
the fan regions of constant limits; ID denotes 
constancy on a fan adjacent to X, then strictly 
increasing, then constancy on a fan of opening 7, 
then strictly decreasing, then constancy on a fan 
adjacent to X». D and DI are defined analogously. 
All cases can be realized in particular configurations. 


Drops in Wedges 


Closely related to the material just discussed is the 
question of the possible configurations of a con- 
nected drop of liquid placed into a wedge formed by 
intersecting plates of possibly differing materials, in 
the absence of gravity. Thus, one has distinct 
contact angles 7,72 on the two plates. Finn and 
McCuan showed that if (^1,52) € R then the only 
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Figure 19 m < 2« < 2m. Possible modes of behavior. Repro- 
duced with permission from the Pacific Journal of Mathematics. 


possibility is that the drop surface S is part of a 
sphere. For data in Df, no such drop can exist, 
barring exotically singular behavior at the vertex 
points where the edge of the wedge meets S. 

For data in D the situation is less clear. Concus, 
Finn, and McCuan (CFM) showed that local 
behavior exhibiting such data is indeed possible; 
however, they conjectured that such behavior 
cannot occur for simple drops. In conjunction with 
the above results, they were led to the conjecture 
that the free surface S of any liquid drop in a planar 
wedge, that meets the wedge in exactly two vertices 
and the wedge faces in constant contact angles 
^1,7y2, is necessarily spherical. Here it is supposed 
only that 0 € 4, %2 € m. 

The behavior of a drop of prescribed volume, as 
the data move from the midpoint of R to the D 
regions along parallels to the sides of R, is displayed 
in Figure 20. As one moves into the D regions, the 
drop detaches from one side of the wedge and 
becomes a spherical cap resting on a single planar 
surface, in accord with the above conjecture. As Di 
is approached, the liquid becomes a drop of very 
large radius that fills out a long thin region in the 
wedge, and disappears to infinity as the boundary of 
R is crossed. However, as Dj is entered, the 
configuration transforms smoothly into a spherical 
liquid bridge, connecting the two faces of the wedge 
without contacting the wedge line. 


Stability Questions 


A number of authors, for example, Langbein, Vogel, 
Finn and- Vogel, Steen, and Zhou, have studied the 
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(a) In R, near Dj (a) In R, near D5 


(b) In R, near D, (b) Center point 


(c) In D, 


(c) In R, near D; 


(A) (B) 

Figure 20 (A) Drop configurations in wedge with opening 
angle 2a = 50^, for three data positions on the line y, — y; —y 
(a) y=70° (in R, near Dj); (b) y=90° (in R, near D;); (c) 
y — 110^ (in D, ). The first two cases yield edge blobs, the third a 
spherical tube that does not contact the edge line. (B) Drop 
configurations in a wedge of opening angle 2a = 50^, for three 
data choices in R, on the line y, =% — y; — y; (a) y= 70^ (near 
D3); (b) y =90° (center of R); (c) y=35° (near D;). As D; is 
entered, original boundary conditions can no longer be satisfied 
by spherical drop, but configuration changes smoothly into drop 
on single plane, with prescribed data for that plane. Reproduced 
with permission from Concus P, Finn R and McCuan J (2001) 
Liquid bridges, edge blobs, and Scherk-type capilliary surfaces. 
Indiana University Mathematics Journal 50: 411—441. 


stability of liquid drops trapped between parallel 
plates, forming an annular liquid bridge joining the 
plates under the capillarity boundary condition of 
prescribed contact angles 7,72 on the respective 
plates. These studies consider the effects of dis- 
turbances within the fluid, assuming the plates are 
rigid and perfectly parallel. CFM show that from the 
point of view of physical prediction, the results of 
these studies may be open to some question. 
Specifically, they show that unless the drop is 
initially of spherical form, then infinitesimal tilting 
of one of the plates always results in a discontinuous 
transition of the drop form. Depending on the 
particular data, the transition can be to a spherical 
drop; however, it can also occur that the tilting 


causes the entire fluid to disappear to infinity in the 
wedge. 

CFM proved that if a connected liquid mass with 
spherical outer surface S cuts off areas |W4]|, |W;| 
from plates I4, IT? which it meets in angles ^, 72, as 
in Figure 20, then 


2 
3|V 
— 》 |W;| cos y; + |S| "15 [30] 
R 
where |S| denotes area of the spherical free surface 
interface, |V| the enclosed volume, and R the radius. 
An immediate consequence is that the mechanical 


energy E of the configuration is 


30| V| 
R 


where ø is surface tension. Using this result, they 
show that if a spherical liquid mass meets two 
wedge faces in angles 7,72 in the absence of 
gravity, then the configuration has smaller mechan- 
ical energy than does any connected liquid mass of 
the same volume that meets only one of the faces in 
the contact angle for that face. In turn, the drop on a 
single face has smaller energy than does a spherical 
ball of the same volume that meets no face. Note 
that in all zero-gravity cases for which stability 
relative to plate tilting can be expected, the liquid 
mass must be spherical. 


E = 


[31] 


Compressibility 


Until very recently, all literature on capillarity was 
based on a hypothesis that the body of the fluid 
is incompressible. Indeed, from the point of view 
of macroscopic mechanical measurements, most 
liquids are nearly incompressible. But all liquids are 
also to some extent compressible, and this property 
was even conceptually essential in our characteriza- 
tion in the section “Gauss’ contribution: the energy 
method” of the surface energy, even for the nomin- 
ally incompressible case. It is as yet unclear to what 
extent the compressibility properties of the bulk 
liquid will influence the physical predictions of the 
theory. In this connection, see the remarks at the end 
of the section “Uniqueness and nonuniqueness.” 


The Equations | 


Finn derived two possible equations extending [16] 
and [17], arising from different modelings. Both 
characterize equilibrium points as stationary points 
for the mechanical energy, and both are based on a 
hypothesized  pressure-density relation p= po + 
x(p — po). The first equation takes account of 
the change in density with height, arising. from 


the gravity field. For a container consisting of a 
semi-infinite vertical cylinder, closed at the bottom, 
one obtains 


div Tu — -— *xg(1—cosw)--A [32] 


where w is the angle between the upward directed 
surface normal and the vertical axis, and A is to be 
determined by a volume constraint. Athanassenas 
and Finn proved that for a general smooth domain 
Q, prescribed ^, and prescribed fluid mass M subject 
to the restriction 


M < po|Q|/xg [33] 


there exists exactly one solution of [32] achieving 
the boundary data y. 

The condition [33] is necessary for existence with 
the prescribed mass. 

The methods used for this theorem do not permit 
regularity conditions to be relaxed to allow domains 
with corner points. An approximation procedure 
yields an existence theorem for such cases, however 
the uniqueness proof then fails; it can be replaced by 
a weaker result, estimating the difference between 
two eventual solutions: Let u, v, be solutions of [32] 
in a piecewise smooth domain Q, and suppose v: 
Tu € v- Tv on X—0€ except at the corner points, 
where no data are prescribed. Then 


u X v- xo/po [34] 


throughout 2. 

Note that in this result, no growth condition is 
imposed at the corner points. It can happen that 
both u and v are unbounded at a corner point; 
nevertheless, [34] holds uniformly over Q. 

The solutions of [32] emulate many of the 
characteristics of solutions of [16]. Notably, there is 
again a dichotomy of behavior, depending on open- 
ing angle 2o at a corner point, with all solutions 
either bounded, or unbounded with growth like 1 /r. 


The Equations Il 


If in addition to taking account of the change of density 
with height, one accounts for the energy change due to 
expansion or contraction of volume elements with 
changing density, one is led to the equation 


div Tu — 20 — XPo (ex? — 1) 
OX 


+ xg(1 — cosw) +A [35] 


Here the changes from the incompressible case are 
much more significant than for [32]. In order to 
ensure stable behavior of solutions, it seems appro- 
priate to impose the condition po > xpo. The general 
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existence theorem above can no longer be expected; 
it is possible to give explicit examples of analytic 
domains, and constant data y, for which no solution 
of the problem exists. Thus, even in a large down- 
ward gravity field, the solutions can emulate the 
behavior of solutions of [18]. That can happen, 
however, only for data ^ exceeding 7/2. The 
condition [33] is again necessary for existence. 

For eqn [34], A cannot be eliminated by addition 
of a constant to the solution, and its determination 
creates a new level of difficulty toward solution of 
the physical existence question. Athanassenas and 
Finn proved unique existence of solutions of [35], 
[17] for a capillary tube of general smooth section €) 
dipped into an infinite liquid bath (which corre- 
sponds to A = 0), when 0 € y € 7/2. If y > 7/2 then 
solutions do not always exist; it can happen that the 
surface moves down to the bottom of the tube, 
regardless of the depth of immersion. Under a 
hypothesis of radial symmetry, Finn and Luli were 
able to prove the existence of solutions with 
prescribed mass in a semi-infinite cylinder closed at 
the bottom, in the range 0 € y < m, and uniqueness 
if 0 < y € 7/2. Note that in this case, values y > 
7/2 are not excluded. For large enough mass, the 
surface will always cover the base of the tube. 


Closing Remarks 


This brief survey is intended only as a general 
indication of the current state of the theory; much 
material of interest could not be included. Nor have 
we addressed hysteresis effects on contact angle. 
Detailed references to the material discussed and also 
to further information can be found in the articles 
listed below. More recent publications can be located 
by following links in MathSciNet or Zentralblatt. 
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Burgers Type Equations 


We consider here two types of equations: the scalar 
partial differential equations (PDEs) of the form 


d oem eso i 


f —-f(x,t,xeR,tec€R,, and the scalar difference- 
differential equations of the form 

OF F(x, t) — F(x — &,t) 

ap + PCF) : = 0, 
F= F(x, 1t),x ER,teR,. 

Equation [1] for the case of linear f (f) 
was called as Burgers equation by Hopf (1950), 
who justified this by the remark: “equation was 
first 

f oF: Uf 

Ot ‘Ox Ox 
introduced by J. M. Burgers (1940) as a simplest 
model to the differential equations of fluid flow". In 
fact, eqn [1] for linear y(f) was introduced earlier in 
1915 by Bateman. Equation [1] for general (f) 
appeared later in very different models, for example, 
in the model for displacement of oil by water, in a 
model of road traffic, etc. 

For y(f)=a + b-f, Hopf and Cole have studied 
[1] basing on the substitution 


Ae) 


reducing [1] to the heat equation 


+f 


dy Og 
Ot | Ox? 


This transformation (often called as the Hopf- 
Cole transform) appeared for the first time in 1906 
in the book of Forsyth *Theory of differential 
equations." 


Equation [2] first appeared for y(F)=a + b- F, 
e=1, x=n € Z, in Levi, Ragnisco, Bruchi (1983) as 
a semidiscrete equation reducible to the linear 
equation 


dG,,(t) 
dt 
by the substitution 


= a(G, 4(t) Gn(t)) 


a /Gn(t) — Gn_1(t) 
i A E N. 

Equation [2] for general w(F) was introduced by 
Henkin, Polterovich (1991) for the description of a 
Schumpeterian evolution of industry. For any £ > 0, 
one can consider [2] as the family of difference- 
differential equations, depending on the parameter 
0—(x/z) € [0, 1), where {x/e} denotes the frac- 
tional part of x/e. For physical applications of [1] 
(see Gelfand (1959), Landan and Lifschitz (1968), 
Lax (1973)), the inviscid case (e = +0) is the most 
interesting. But, for some special physical models 
and for some social and biological applications (see 
Henkin, Polterovich (1991), Serre (1999)), the 
interesting case concerns eqn [2] with e=1 and 
x € Z. 

The results considered in this article concern 
mainly the Cauchy problem for eqns [1] and [2] 
with initial data f(x,0), F(x,0) satisfying the 
conditions 


f (x, 0) ns o, 
0 
f | If (x, 0) — a^ |dx [3] 


十 [ lat — f(x,0)|dx < oo 


x — coo 


and correspondingly 


F(ke 4- 0€) 一 o^, k — xoo 
0 
|F(ke + 07,0) 一 a | 
py [4] 
+ lat— F(ke + 6¢,0)| < oo 
k=0 
where a <a‘’,@€[0,1) and the mapping 


61 (F(ke + 07, 0) — os * k € Z} € I! is smooth. 


The standard classical questions concerning 
Cauchy problems [1], [3] and [2], [4], namely 
those relating to existence, unicity, regularity, and 
conservation laws are well established (see Oleinik 
(1959), and Serre (1999)). This section formulates 
only those which are essential for the study 
of asymptotic behavior of solutions f(x,t) and 
F(x,t), when t —^ oo or £ — 0, and of the relation 
between vanishing viscosity and difference scheme 
approximations for inviscid Burgers type 
equations. 

One can see that asymptotic behavior of solutions 
of [2], [4] when < 一 二 0 is not the same as the 
asymptotic behavior of [1], [3] when & 一 +0, in 
spite of fact that in the limiting case € = +0 both [1] 
and [2] look identical. It can be explained by the 
fact that eqn [2] can be interpreted as a semidiscrete 
approximation of the nonconservative (nonphysical) 
equation 


OF OF e OF 

or t PP a A aa 
However, the problem [2], [4] can be naturally 
transformed into conservative (physical) initial pro- 


blem. Indeed, the substitution 


F 
d 
f= [| > 
o v) 
(under condition of integrability of 1/p(y)) trans- 
forms [2] into the equation 


of (x,t), fet) -Wf e) _ 0 


Ot E 


where u/(f)=y(F). Equation [5] is the so-called 
monotone one-sided semidiscrete approximation of 
conservative viscous equation, 


[5] 


of af “ED of 
gT = 22x («^ x) [6] 


where 


jane I ety) 


The results of finite-difference approximations 
for nonlinear conservation laws (see A. Harten, 
J. Hyman, P. Lax (1976)) explain both the similarity 
of behavior of [6] and [5] as well as some difference 
in the behavior of [1] and [2]. 

For further exposition the following assumption is 


useful: 


Assumption 1 Let y in [1], [2] be a positive and 
continuously differentiable function on the interval 
[a^ , at]. Let y’ have only isolated zeros. 


, x 一 oo 
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From references one can deduce the following gene- 
ral properties of Cauchy problems [1], [3] and [2], [4]. 


Theorem 0 Under Assumption 1, we have: 


(i) There exists a unique (weak) solution f(x,t), x € 
R, t € R, of the problem [1], [3]; this solution is 
necessarily smooth for t > 0; besides, it satisfies 
the following conservation laws for t > 0: 


f(x,t) >a, x- -—oo 

f(x,t) —»- at, x— +œ 

i n = fix, 5) ds m [ te. t) — a” )dx 
- J i p(y)dy 


Moreover, if the initial value f(x,0) is nonde- 
creasing as a function of x, then solution f(x, t) 
is nondecreasing as a function of x for all t > 0. 
(ii) There exists a unique solution F(x,t) x € R, t € 
R, of the problem |2], [4]; this solution is 
smooth for t > 0; besides, it satisfies the follow- 
ing conservation laws for t > 0 and 0 € [0, 1): 


k 一 —oo 
k 一 +00 


d x[ dy | 3 [^ 4 
dt kat Flke+0e t) P(Y) a ey) 


k=—06 
—aoat*-—ao 


F(ke + be,t) > a, 
F(ke + be,t) > at, 


Moreover, if for some 0 c [0, 1) the F(ke + 0,0) is 
nondecreasing as a function of k € Z then solution 
F(ke 十 0e, t) is also nondecreasing as a function of 
k € Z for all t > 0 and tbe same 0. 


Gelfand’s Problem and lljin-Oleinik 
Theorem 


The main results considered in this article are related 
to the following problem, formulated explicitly by 
Gelfand (1959): to find the asymptotic (t — oc) of the 
solution f(x, t) of the eqn [1] with the initial condition 


f0) = 1 Foc, 


if +x > +x* 


if x € [x ,x*] 0 


where a^ <a’. 

Gelfand found a solution to this problem for the 
inviscid case ¢=+0 with initial conditions 
f(x, 0)=ar if x < 0, and f(x, 0) 2a* if x > 0 (see 
below), and remarked that it would be interesting to 
prove that the main term of the asymptotic (t — oc) 
of f(x,t) satisfying [1], [7] coincides with the 
solution of [1], [7] for e = +0. 
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Gelfand's problem admits natural extension for 
eqn [2] with the initial conditions 


F(x,0) =a*, 
F(x, 0) = F°(x), 


if tx > +x* 


l un" [8] 
if x € [x ,x'] 

Let us introduce, for u € [a , at], the function 
(u) = 一 三 c(y)dy. Let the function y(u), u € 
[o^ , at], be upper bound of the convex set 


{(u,v):v € v(u), u € la ,o*]) 


By Assumption 1, the set s—(wc[o',o^]: 
w(u) < $(u) is the finite union of intervals, 
s—(a^, Bo)U(o1, 81)U (ar o7), where ao^ —ao € 
Bo oi < B1--: €&or € Bio. 

Let us define the function f(x, t) by 


i - if x «(aoa )-t 
f(x,t) E | (d 9 (xt), if yla )-t €x € g(a*)-t 
a’, if x > y(at)-t 


where in the case Ñ (u) = &, u € (oj, Bi), L=0, 
1,..., L; also, by definition, (2) ^ " (£j) — [aj, £j]. 


Theorem 1 (Gelfand) Tbe solution f(x, t) of tbe 
problem [1], [7] for the case s =+0 and initial 
conditions f(x, 0) —a*, if +x > 0, has the explicit 
form: f(x, t) — f(x, t). 


The analogous statement is valid also for the 
problem [2], [8] if, in the construction above, one 


takes 
u dy 
— = Eu 
j, p(y) 


instead of v(u), u € [a7, at). 

The Gelfand problem for [1], [3] and [1], [7] with 
monotonic y(f) was solved by Iljin and Oleinik 
(1960). In the case a =a‘, the solution of this 
problem follows from an earlier work of Lax (1957). 
For the case of linear (f), the solution of this problem 
follows from an earlier work of Hopf (1950). 

For semidiscrete initial problems [2], [4] and [2], 
[8], the analog of the asymptotic results of Hopf and 
Iliin-Oleinik have been obtained and applied by 
Henkin and Polterovich (1991). 

The case of increasing (f) has been studied” in 
detail. In this case, for both initial problems [1], [3] 
and [2], [4], there is uniform convergence of solutions 
f(x, t) and F(x, t) to the so-called rarefaction profile 


| fae, +x > yla*)t 
Ba) = Leno, x € ola") pka") -g 


t — oo (see Iljin and Oleinik (1960) and Henkin 
and Polterovich (1991)). More precise result in 
this case about convergence to the so-called 


N-wave has been obtained by Dafermos (1977) 
and Liu (1978). 

For the case of a general (f), in particular, for 
the case of nonincreasing (f), we need the notion 
of shock profile. Following Serre (1999), three 
definitions can be introduced. 


Definition The initial problem [1], [3] (correspond- 
ingly, [2], [4]) admits (a^, a*)-shock profile (o^ < a*) 
if there exists a traveling-wave solution of this equation, 
that is, of the form f= f(x — ct) (correspondingly, 
F— F(x — Ct)), such that f(x) —^ a* when x 一 +oo 
(correspondingly, F(x) 一 at when x 一 +00). 


From the results of Gelfand (1959) and Oleinik 


(1959), it follows that initial problem [1], [3] admits 
(a~, a*)-shock profile iff 


u—- QQ 


f pbi Vaela] [9] 


From the results of Henkin and Polterovich 
(1991) and Belenky (1990), it follows that initial 
problem [2], [4] admits (a~, a*)-shock profile iff 


LN 1 [ dy 
G at =Q Ja- p(y) 


1 " d 
> /| 总 
HW —W Ja p(y) 


In the case € = +0, the equality in [9] and [10] is 
called the Rankine-Hugoniot condition, the inequal- 
ity in [9] and [10] is called the entropy condition (or 
the Gelfand-Oleinik condition). 


Vu € (oa ,o*) [10] 


Definition For initial problem [1], [3] (correspond- 
ingly, [2], [4]) admitting (o^, a*)-shock profile and 
for € = +0, we will call by shock waves the weak 
solutions of [1], [3] (correspondingly, [2], [5], [4]) of 
the form 


~q! J- 
f (x—ct)—a«, 


at 


F (x — Ct) =a*, 


if +x > ct 
if +x > Ct 


where c, C satisfy Rankine-Hugoniot and entropy 
conditions [9], [10]. 


Definition The (o^, a*)-shock profile for [1] (cor- 
respondingly, for [2]) is called strict if in addition to 
[9], [10] we have the Lax (1954) condition: 


plat) € c € pla) [11] 
and correspondingly 


Plo) < C< e(a ) [12] 


The (a^, a*)-shock profile for [1] or [2] is called 
semicharacteristic if one of the inequalities in [11] or 
[12] is strict and the other is an equality. This profile 
is called characteristic if both inequalities in [11] or 
[12] are equalities. 


One can check (Iljin and Oleinik 1960, Henkin and 
Polterovich 1991) that if in addition to Assumption 1 
the function y on [a , o^] is nonconstant and 
nonincreasing then eqn [1] (correspondingly，[2]) 
admits a strict (a^ , a*)-shock profile. 

The main result of Iljin-Oleinik (1960) for eqn [1] 
and analogous statement of Henkin and Polterovich 
(1991) for eqn [2] can be presented as follows. 


Theorem 2 


(1) Let tbe initial problem |1], [3] admit a strict 
(a7, a*)-sbock profile f. Let f(x,t), x eR, te 
R,, be a solution of [1], [3]. Then there exists 
do ER 


sup |f(x,t) — f(x — ct —do)| +0, t—oo [13] 


xER 


The value of do is determined uniquely by relation 


| tt.0)-7 


(ii) Let the initial problem [2], [4] admit a strict 
(a^, a*)-shock profile F. Let F(x, t), x eR, t € 
R, be a solution of |2], [4]. Then there exists 
continuous function Do(0), 0 € [0, 1), such that 


sup |F(x,t) — F(x — Ct — Do({x/e})| — 0, 
xER [14] 


t — oo 


The function D9o(0), 0 € [O, 1], is determined 
uniquely from relation 


(x — do)! dx = 0 


S > (9(F(,0) — &(F(n ~ Do))} = 0 


k==o0 


where 


_ [* dy 
oF) = | mat 


If in conditions (i) and (ii), we take € = +0 then 
there exist do, Do such that V6 > 0, we have 


F<A, F< A 


— 


(iii 


sup |a — f(x, t)| 
x>ct+dy+é 
+ sup ja -—f(x,t)—0, t—-co 
x<ct+dy—6 [15] 
sup la’ — F(x,t)| 
x2Ct4-Do-4-6 
+ sup fla —F(x,t)—0, too 


X<Ct 十 万 0 一 6 
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The values of dy and Do are determined by 


do oo 
f VE- d+ f F0) -a) d=0 
Do a 
J (F(x,0)- a7) de + | (F(x,0)—a*)dx = 0 
一 CO Do 
Remarks 


(i) The statements of Theorem 2 give a positive 
answer to Gelfand’s question for the case of 
initial problem [1], [3] and [2], [4], admitting 
strict shock profiles. 

(ii) For linear y(f)=a + bf, a>0,a+ba* >0, 
b < 0, the traveling waves f, F for [1], [3] and 
[2], [4] can be found explicitly: 


-— m at-a 

1 + exp{—p(x — ct)} 

" Pie QE BW. 

c-ac5(e To J; dme CN 
入 do. MENEN 

== T1-exp[-P(x- Ct)) 

C= bjm% It pa pt toe 

€ a+bat 

where 


E a+bat\" 
by = (a+ ba (1 一 (=) ) 
(iii) For initial problems [1], [7] and [2], [8], o* > 
a , the asymptotic convergence statements 


[13]-[15] admit the precise asymptotic esti- 
mates (see Iljin and Oleinik (1960) for [1], [7]: 


sup |f (x,t) — f(x — ct — do)| = O(e ^) 

xER [16] 
y>O,e>0 

sup |F(x,t) — F(x — Ct — Do({x/e}))| = O(e ) 
y>0O,e>0 [17] 


f(x,t)- a* for 
t >to, €=+0 
F(x,t)= a> for 
t >t, €=+0 


+x > (ct + do) 


[18] 
+x > +(Ct+ Dg) 


Theorem 2(i) is proved basing on the following 
idea. Let f satisfy the initial problem [1], [3] and let 
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f(x — ct + do) be (o^, a*)-shock profile for [1], 
satisfying condition [13]. Put 


ó(x, t) ef. Uy. t) 


The function 6(x, t) satisfies the nonlinear parabolic 
equation 


(y — ct — do) )idy 


O06 Os 
Kf) a= 22 


06 - 
一 十 p(kf 十 (1 x2 


Ot 
where x(x, t) is some smooth function of (x, t) with 
values in [0, 1]. 

Besides, by conservation law of Theorem 0(1), we 
have ó(x, t) —^ 0, x — coo, Vt > 0. 

Estimates basing on maximum principle and 
appropriate comparison statements give that 
d(x, t) > 0, x € R, t — oc. It implies that 


f (x,t) f(x—ct—do) 290, xER, too 


Theorem 2(ii) is proved in a similar way. Let F(, 
t) satisfy the initial problem [2], [4] with x= € 
Z, €=1,0={x}=0, and let F(n — Ct — Do) be 
(o^, a*)-shock profile for [2], satisfying condition 
[14]. Put 


A(n,t) — &(F(n — Ct — Do))} 


= X {®(F(n,2)) 


—OC0 
Then function A(n, t) satisfies the semidiscrete 
parabolic equation 
dA(n, t) 
dt 


= e(& C P (ke(F) 
(1 — K)e(F)))(A(n — 1,t) — A(n,t)) 
where «x(n, t) is some function with values in [0,1]. 
Besides, by conservation law of Theorem O(ii), we 
have 


A(n,t) —^0, n too Vt 20 


Estimates, basing on generalized maximum prin- 
ciple and comparison statements, give that 
A(n, t) > 0, n € Z, t — oc. It implies that 


F(n,t) - Fn — Ct--Do) 20, neZ,t—oo' 


Remark For the cases of nonstrict shock profiles 
(characteristic or semicharacteristic) the statements 
of Theorem 2 are not valid. The reason is that, 
under initial conditions [3], [4] for any do and Do, 
we have 


J E TE de n 


and, correspondingly, 


» (e (F(ke + 0e — Do) — &(F(ke + 65,0))) = 
So, the crucial argument, related to conservation 
law, does not hold. 


One can extend the important Theorems 2(i), 2(ii) 
for the case of nonstrict shock profiles in two different 
ways: by changing conditions of these theorems or by 
changing conclusions of these theorems. 

The first method (started by Mei, Matsumura, and 
Nishihara in 1994) was completed by the following 
L'-asymptotic stability result (Serre 2004). 


Theorem 3 (Freistiihler-Serre). Let eqns [1], [2] 
admit (o^, a*)-shock profiles and f, È — the corre- 
sponding train-wave solutions of [1], [2]. Let 
f(x, t), Fin, t), x € R, n € Z, t € R, be solutions of 
eqns [1], [2] with such initial conditions that 


[reo 


> rwo- 


[feo 


and, correspondingly, 


- f (x)|dx < oc 
n)| < oo 

Then 

— f(x — ct — dg)|dx — 0 


S |E(n, t) — F(n — Ct — Do)| > 0, 


一 CO 


E — 09 


where constants dy and Do are calculated from the 
same relations as in Theorem 2. 


Remark For the inviscid case & = +0, the state- 
ment of Theorem 3 is still valid. for equations 
admitting strict shock profiles, but generally is not 
valid for equations admitting only nonstrict shock 
profiles (see Serre (2004)). 


The second method permits, keeping initial con- 
ditions [3], [4], to localize the positions of viscous 
shock waves for generalized Burgers equations 
(see the next section). 


Asymptotic Behavior of Solutions of 
Generalized Burgers Equations 


The main current interest and the main difficulty in 
the study of Gelfand's problem for generalized 
Burgers equations consist in the following question 
formulated explicitly for initial problem [1], [3] by 
Liu et al. (1998): *In the Cauchy problem there is 


the question of determining the location of viscous 
shock waves". A similar question and related 
conjecture were formulated by Henkin and Potter- 
ovich (1999) for the initial problem [2], [4]. 

For solving this problem, it is important to solve it 
first for the Burgers type equations admitting 
nonstrict shock profiles. 


Theorem 4 (Henkin-Shananin-Tumanov . 


(i) Let the initial problem [1], [3] admit the nonstrict 
(a^, a*)-sbock profile [9] and f(x — ct) be a 
corresponding traveling-wave solution. Let 

(a) £0, 

2 (a) £0, 
Let f(x, t) be a solution of [1], [3]. Then there 
exist constants ^y and do such that 


if o )=c 
if v(o^)-c 


sup [f (x,t) — f(x — ct — ey Int — do) 一 0, 上 一 oo 
xER 


where 


(ao — a )-vyo 


—1/v'(a*), if o(a) > c= o(o*) 
=< 1/v'(a h if p(a~) 2c» (at) 
1/g (a7) -1/g(a*), if p(a~) =c= plat) 


(ii) Let the initial problem [2], [4] with e=1 admit the 
nonstrict (o^ , a*)-shock profile [10] and F(n — Ct) 
be a corresponding traveling-wave solution. Let 

po (ae )#0, ifp(a)-C 
ya") #0, ife(a*) C 
Let F(n, t) be a solution of |2], [4]. Let 


AF(n,0)“{F(,0) — F(n — 1,0) > 0 


Then there exist constants To and Do such that 


sup |F(z, t) = F(n — Ct = To Int — Dg)| — 0, 
ncz 


I — oo 
wbere 
(at — a^ )-To 


-C/(2 (0 )), if p(a~) > C= ya") 


C/(29 (a^ )), if p(a~) = C > pat) 
(C/2)|-1/v'(a*) 
T1/g' (o7)]. if p(a~) = C= e(a*) 


Remarks 


(i) One could think that nonstrict shock profiles 
as in Theorem 4 can appear only in exceptional 
cases. But Proposition 2 and Theorem 5 below 
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show, on the contrary, that characteristic shock 
profiles and, as a consequence, the behavior of 
initial problems [1], [3] and [2], [4] as in Theorem 
4 are rather a rule than an exception. 

(ii) The statement of Theorem 4(i) (and also of 
Theorem 5(i)) below) disprove the Gelfand hope 
that the main term of asymptotic (t — oc) of 
f(x, t), satisfying [1], [7], coincides with the 
solution of [1], [7] for e =+0 with the same 
initial condition. Indeed, in conditions of Theorem 

* 4, we have y(a~)=c or y(a*)=c, but e'(o) F 

y'(a*); then for any e> 0 the traveling wave 

f(x — ct — eyo Int — do) for [1], [3], concentrated 

near the point x,(£) — ct + eyolnt + do, moves 

away (t — oo) from the shockwave for [1], [7] for 

€ = 4-0, concentrated near the point xo(t) — ct + 

o( In t), where o(In t)/ Int — 0, t — oc. 

Theorem 4 (and also Theorem 5 below) also 

illustrate another interesting phenomenon: for 

the case wa ) Æ y'(a*), one has asymptotic 
convergence of the solution of [1], [3] (corre- 
spondingly of [2], [4]) to the traveling 
wave f(x — ct — eyolnt — do) (correspondingly 

F(x — Ct — To Int — Do), which does not 

satisfy eqn [1] or correspondingly eqn [2]. Such 

a phenomenon was first discovered by Liu and 

Yu (1997) in the special boundary-value pro- 

blem for the classical Burgers equations, if 

u(x, t) satisfies the following conditions: 


— 


(iii 


if ty +U Uy = uxx, U(0,t) = 1, u(oo,t) = —1, 


i(x,0) = -th 5, then 
Iu (x, t) + th; (x —In(1 +t))| +0, t+ œ, x>0 


Theorem 4 is proved in basing on the following 
idea. Let f(x, t) satisfy [1], [3] and F(x, t) satisfy [2], 
[4]. Let f(x — ct) be the traveling wave for [1], [3] 
and F(a — Ct) be the traveling wave for [2], [4]. 
Suppose that y(a~) > c=C=y(a*). Let da(t) and 
DA(t), A > 0 be functions such that 


ct+Avt 
/ {f(x,t) — f(x —ct—da4(t)))dx =0 [19] 
J ct—A vt 


and, correspondingly, 


[Ct--A vt] . 
(9 (F(R,1)) — &(F(k — Ct — DA(t)))] 
k—[Ct—A Vt) 
t (Ct + Avt — [Ct 4- Av't] (G(F(Ct + A vt] + 1, t)) 
— e(F([Ct + AV/t] + 1— Ct -- DA(2)) 20 
[20] 
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The relations [9], [20] can be called “localized 
conservation law." The proof contains two difficult 
parts. 

The first part consists in proving that for A > 2,/c 
(correspondingly, A > 2VC) the following asymp- 
totics are valid: 


c: In£ 


dat) er wan t too: t-> 00 
Clnt 
Dal) 35 aror D +o): t — oo 
[21] 


where d?, D? are independent of A. 
The second part gives the following convergence 
statements: 


sp |/ f0) fe di) 
x€[ct- Av/t,ct--Ay/t|  J ct-Avit 
dy| 一 0, 


i — oo 


n 


{®(F(k, t)) 


sup | 
xe[Ct-A V, CH-A v1] &— [Cr A vr] 


— &(F(k — Ct — Da(t)))}| 0, t— oo 


The precise a priori estimates of local solutions of 
[1], [2] play an important role in the proof. An 
example of such an estimate, also useful for further 
results, is given below. 


Proposition 1 Let, in eqn |2], C=y(0) > 0, «= 
1, 0 € e/(0) < yo, x & (x 一 Ct)//Ct. Let tbe func- 
tion F(x, t), defined in the domain Qo) = ((x, t): a, < 
X < a}, a2 > 0, satisfy eqn [2], 

AF(x, t) F(x, t) — F(x —1,t) >0 

|F(x,t)| € ——, (x,t) € Qo, t > to 
Then 

AF(x,t) < — (x,t) € Qo, t > to 


wbere 


d C 


d = min(x — 41,4? — X) 


B — Bo la E (+e )a + In(1 +a) 


and By is an absolute constant. 


It is interesting to compare a priori estimate of 
Proposition 1 with some similar (but less precise) 
estimates in the theory of classical quasilinear 
parabolic equations (Ladyzhenskaya et al. 1968). 

We will formulate now the general conjecture 
concerning asymptotic behavior of solutions of 


initial problems [1], [3] and [2], [4] and some 
partial results which confirm this conjecture. To 
simplify formulation we admit the following. 


Assumption 2 Let (u) and VW(u) be upper bounds of 
the convex hulls for the graphs of 


and 
. u dy 
wu) =/ oly) 


respectively, with u € [a , a*]. We suppose that 


^ 


s = {u € [o ,o*]: y(u) < v(u)) 
= (a^, o) U (a1, 81) U--- (or, a") 
where 
G —aj€eo€0j € Di € ar «fig ux 
or, correspondingly, 
S = {u € [aat] : V(u) < V(u)) 
= (a^, bg) U (a, b1) U--- (au, 0) 
where 
a = ap «bo «ai «bi < --- < am < bm = a7 


In addition, we suppose that 4'(oj) Z 0, y'(3;)F# 
0,/=0, 1,..,L or, correspondingly, plam) + 
0, 0 (Oy) #0; m=O, 1,..., M. 


Proposition 2 (Weinberger 1990, Henkin and 
Polterovich 1999). Under Assumptions 1, 2, one has: 
(i) If w€[a,at]\s and, correspondingly, u € 

[a , a^] V S, then following functions are well 


defined: 


Bi, if x < p(fi) -t 
a pI (x/t), if p(B) -t<x 
gl (=) = € plays) -t 
0144; ] x : dni "$, 


and, correspondingly, 


bs: if x < (bm) $; 

ma | PM), if (Om) tsa 
(=) = < plam) t 

ias if x > (amn41) +t, 

m= 01M 


(ii) For any interval (oj, Bı) C s and, correspond- 
ingly, (am, bm) C S there exist traveling waves 
f(x — cıt) for [1] with overfall (a), Bj) and, 


correspondingly, F(x — Cmt) for [2] with over- 
fall (am, Bm), where 


1 ĝi d 
a7 3]. e(y)dy 
q-—w(B), 120,..,L-1 
q= play), t= :sb 


and, correspondingly, 


"T [ dy 
= bin — dm a, P(Y) 
Calle), 6 20,....M—1 
Ca pd.) m= lc M 
Conjecture (Henkin and Polterovich 1994, 1999, 


Henkin and Shananin 2004). Let 


f(y yi) 


B L-1 x L—1 
= 3 filz — cıt — eyi(t)) + $a) - 3 
/—0 l=0 1—0 
L 
= > a, L1 
I1 
F(ne,t,To,...,0m) 
M 
= Y Ënne - Cut - Tm(t)) + $ Gn (=) 
m—0 m=0 t 
M-1 M 
- w hye wm M>1 
m=0 m=1 


Then under Assumptions 1, 2, the following state- 
ments are valid: 


(i) For any solution f(x, t), xe R,t€ R,, of ini- 
tial problem [1], [3], there exist shift-functions y(t): 


^j In t - O(1) € y(t) € yf In t+ O(1) 


zy €3; €605 19€0,1,...,L 
such that 
sup |f (x,t) — f (x.t, Y0, V1; =- -3 YL)| — 9, 
xcR 
i — oo 


(ii) Moreover, in (i) one can take 


“=U 
i-a) 
E | T CN NN 
“| Fa) a) SPAM 
(ap if l= L»0,v(or) x v(BL) 
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(iii) For any solution F(ze, t), » € Z, t € R,,ofinitial 
problem [2], [4], there exist shift-functions DL, (£): 


T> Int+ O(1) <T,,(t) < TInt + O(1) 


Se PF Aleem, lFe913L 

such that 

Sup Wines) 一 F(ne, t, Do, F'1,..., TM) ^ 0, 
P — O0 


(iv) Moreover, in (iii) one can take 


Dow 
_ Cm 
— (bin — 4m) 
1 
TION if m —0 « M,v(ao) Z v(bo) 
1 1 . 
ý Plam) p (bm) doses 
1 
Flan)’ if m=M>0,y(am) X (bm) 


The main result confirming formulated conjec- 
tures is the following. 


Theorem 5 (Henkin and Shananin). Conjecture 
(i) for L=1 and corresponding conjecture (iii) for 
M — 1 are true, that is,for solution of initial problem 
[1], [3] there exist shift functions y(t) = O (In t) such 
that for t > oo we have 


f(x — egt — &yo(t)), if x < cot 
f(x, t)-+¢ o P(x/t), if cot < x < cyt 
fi(x—at—ey(t)), if x> cıt 


and for solution of initial problem [2], [4] there exist 
shift functions T,,(t) = O(In t) such that for t —^ oo 
we have 


~ 


Fo(ne — Cot — To (t)), if ne < Cot 

Beadles q 7 (ne/t), if Cot € ne 
3 € Cat 

Fi (ne — Cit — eI’) (t)), if ne > Cit 


The proof of Theorem 5 is of the same nature as 
the proof of Theorem 4. 


Remarks 


(i) The proof of stronger Conjectures (ii) and (iv) 
for L=1 or M=1 are in preparation. 
(ii) The numerical results, Rykova and Spivak (pre- 
print, 2004), confirm conjecture (iii) for M = 2. 
(iii) The results of Weinberger (1990) and Henkin 
and Polterovich (1999) confirm convergence 
statements of Conjectures (i), (iii) for all L and 
M, but only on the intervals of rarefaction 
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profiles: x € [y(G;)t, p(a ;1)t] or, correspond- 
ingly, x € [p(bm)t, Plam+1)t], t > 0. 


The problem of finding asymptotics (t — oc) of 
solutions of (viscous) conservation laws has been 
posed originally not only for generalized Burgers 
equations but also for systems of conservation laws in 
one spatial variable (see Gelfand (1959)). In this 
direction many important results on existence and 
asymptotic stability of viscous shock profiles (con- 
tinuous and discrete) have been obtained and applied 
(see Benzoni-Gavage (2004), Lax (1973), Serre 
(1999), Zumbrun and Howard (1998) and references 
therein). The results of type of Theorems 4,5 have not 
yet been obtained for systems of conservation laws. 

It is also very interesting to study asymptotic 
behavior of scalar (viscous) conservation laws in 
several spatial variables (continuous or discrete), 
basing on the asymptotic properties of Burgers type 
equations. In this direction there have been several 
important results and problems (see Bauman and 
Phillips (1986), Henkin and Polterovich (1991), 
Hoff and Zumbrun (2000), Serre (1999), 
Weinberger (1990), and references therein). 
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What is a Cellular Automaton? 


Cellular automata (CAs) were first introduced by 
J von Neumann in his investigation of “complexity,” 
following an inspired suggestion by S Ulam. But in the 
last 50 years they have been investigated and used in a 
number of fields; widely different terminologies have 
been used by researchers that now it is difficult even 
to give a precise general definition of a CA. Thus, 
some definitions and approximations are in order. 
First a broad definition: 


. have a number of cells (boxes); 

. at any (discrete) time step, any cell can present 
itself in a certain “state” among a finite number 
of different states; 

3. the state of any cell can change (evolve) from a 

time step to the subsequent time step; and 

4. there is a rule (evolution law, EL) 

determines this transition. 


we 


which 


Note that the number of cells can be finite or infinite; 
the cells can be arranged on a line, on a surface, in the 
ordinary three-dimensional (3D) space, or possibly in a 
hyperspace (in any case, the cells can be numbered); the 
different states of a cell can be denoted by integer 
numbers but, in different contexts of application of 
CAs, different imaginative pictures have also been used 
(e.g., different colors, dead and living cells, number of 
balls in a box, etc.); the evolution of a CA proceeds in 
finite time steps (time is also discrete); the EL, provided 
that it is effective on any possible configuration of a 
given CA (computability), is otherwise completely 
arbitrary (indeed, there are not only deterministic and 
probabilistic ELs, but also those that “evolve” in time — 
following a meta-EL, which in turn can be determinis- 
tic or probabilistic). 

Consider some examples of CAs. 


Example 1 (CA1) Consider a linear array of seven 
boxes (cells; one can number them c(i), i= 1,2,..., 7). 
Each box can be empty or it can contain a ball (so 
there are just two states for each cell). Given a 
configuration of this CA at time t, what happens at 
time t+ 1 (EL)? 


(i) the state of the first box c(1) never changes; 
(ii) for each other box c(i), i=2,3,..., 7; 
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(iia) if the box is empty and the box on its left is 
empty then put a ball in the box; 

(iib) if there is a ball in the box and also there is a ball 
in the box on its left then empty the box. 


An example of the evolution of such a rather trivial 
CA is given in Figure 1. 


A more precise notation can now be established. 

First, let us denote the state of a cell at time £ by a 
“state function,” say $. According to the point (iib) 
above, the number of possible states is arbitrary but 
finite: denote this number by the positive integer M 
(M > 1). Then S takes values on a finite field, say 
Zm = Z./MZ={0,1,2,...,M—1} (in plain words, 
we have denoted the M states for the CA by the 
first M non-negative integers). Different cells can be 
labeled with a progressive number: c(n), n= 1,71 + 
1,...,722 — 1,75; possibly, in case of an infinite 
number of cells, one has s,-— 一 co and/or 
nı — +00. In the case of mj 三 一 co, n2 — oo, one 
speaks of a unidimensional CA. Of course, the field S 
depends on z as well as on time (remember that, for a 
CA, “time” is a discrete variable: t = 0, 1,2,...). The 
field S(z, t) describes completely the CA. If the EL is 
deterministic, then one can determine (com- 
pute) S(n,t) step by step for ? > 0 from the initial 
configuration $(z,0) (initial datum, ID). Consider 
only static ELs, namely those that do not change in 
time. A further distinction can be made: there are 
ELs such that the future state of the generic cell, 
S(n,t 4- 1), depends on the whole current configura- 
tion of the CA (these are called nonlocal ELs) and 
there are ELs for which S(n,t + 1) depends only on 


Figure 1 A seven time-step evolution of CA1 starting from a 
given ID (t=0). Note that a stable configuration has been 
reached at t — 6. 
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the current state of a finite number, say N, of cells 
(local ELs): 


{S(n + k;,t)}, i= 1,2,...,N, ; € Z 
=> S(n,t + 1) [1] 


Note that, in principle, the set of cells that 
determine, according to the EL, the future state of the 
generic cell n, could depend on n, namely one can have 
N=N(n), as well k;=k;(n),i=1,2,-..,N(n) (see 
CA2 below). In any case, such a set of cells is called 
the interaction set (IS). Moreover, the distance from 
the cell n of the farthest cell in the IS is called 
the range R (of the interaction): R= max(|k;|). If 
IS = (c(n — R), c(n — R -1),...c(n),...c(n-- R — 1), 
c(n + R)), then this IS is called a neighborhood of 
range R. It is, moreover, clear that, for unidimensional 
CA, there exists at least one infinite subset of cells that 
have the same state. If there is only one such subset, 
then it is called the vacuum set and the state of its 
cells is called vacuum state: let V denote the value of 
this state(0 € V < M, S(n,t) —». V). 


Example 2 (CA2) An example of CA with 
n-dependent IS (M=2,R=3,V=0). This is the 
EL: the cell c(z) changes its state (0 — 1,1 — 0) iff 


(i) x is even and at least one of the two cells on its 
left is not in the vacuum state; 

(ii) 2 is odd and one or three of the three cells on its 
right are not in the vacuum state. 


An example of the evolution of such a CA is given 
in Figure 2. 


Usually, only a subclass of ELs is considered for 
which the phenomenon of vacuum excitation 
cannot occur. Namely, during the evolution of 
the CA, an infinite subset of the vacuum set 
cannot change its state in just one time step. In 
other words: if the set of cells starting from the 
first cell and ending with the last one for which 


Figure 2 Three hundred and eighty time steps of CA2, starting 
from a random chosen initial configuration. Note the left-right 
asymmetry due to the asymmetry of its IS and EL. 


S(n,t) # V be called population set (PS), then PS is 
a finite set at each time. 

Of course, one can easily devise an EL for which 
this is not true; nevertheless, the EL itself is still 
valid (computable), for instance, 


Example 3 (CA3) This is an unidimensional CA, 
namely there are infinite cells on a line (7 € Z). The 
cells have M states and V — 0; the EL reads: 


the state of each cell cycles in the set of available states 
(0—51,1—2,...,M-2—5 M-1,M-1-— 0) 


Note that the range R is zero, there is a vacuum 
excitation; nevertheless, the EL is effective. 


Deterministic, static, and local ELs that do not give 
rise to vacuum excitation are called normal ELs (NELs). 

Since M, N are finite for an NEL, one can give the 
NEL itself as a table, considering every possible 
configuration of the IS and specifying the outcome 
for each configuration (note that there are MN 
possible configurations). 


c(n — 1),c(n + 2)}, N=3, R=2. The EL is: 

S(n, t) 0000 1 1 1 1 
$(Ó9—1,2) 0.01 100 1 1 2 
S(n+2,t) 0 1010 10 1 
S(n,t+1) 01 101 10 1 


An example of the evolution of such a CA is given 
in Figure 3. 


However, these NELs can also be given in an 
alternative representation (more useful in view of the 
extensions of the concept of CA itself, see below). 
Namely, an NEL can be given as a discrete-time 
EL for the state function S(m,t) in the finite field 
Z4, 7—10,1,2,...,M — 1). 


Figure 3 Four hundred and sixty-one time steps of CA4, 
starting from a random chosen PS of 50 cells. 


For example, the NEL above for CA4 can be 
expressed as follows: 


S(n,t + 1)  S(n — 1,1) + S(n,t) + S(n -- 2, t) 
+ S(n,t)S(n + 2, t) 
+ S(n — 1,t)S(n,t)S(n + 2,t) [3] 


Here and in the following, the symbol M denotes a 
congruence mod M. 
Another example is the following. 


Example 5 (CAS) n € Z,M=3,N=3, V=0,R=1, 
IS = (c(n — 1), c(m),c(a+1)}. The NEL is: 


S(n,t-- 1) S(n — 1,t) + S(n,t) + S(n -- 1,t) 
+ 2S(n — 1,t)S(n + 1,t) [4] 


An example of the evolution of such a CA is given 
in Figure 4. 


Classification of ELs 


Considering a CA with given M > 1, N > 1, the 
number L of possible deterministic, static ELs is 


L(M, N) = M^) [5] 


Of course, this number can be very large for 
relatively small values of M and N also. Never- 
theless, it is a finite positive integer, so that, for 
given M, N, one could denote every EL by an 
integer number and investigate the typical behavior 
of each EL. A considerable reduction of this 
number is obtained if one limits attention to 
totalistic ELs, namely to those whose outcome 
depends only on the global configuration of the 
IS, often just on 


N 
c(n,f)-  S(n-k) i=1,2,...,N, kieZ [6] 


Figure 4 Four hundred and sixty-one time steps of CAS, 
starting from a random chosen PS of 50 cells. 
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Figure 5 A class-1 CA: every ID rapidly evolves to 
periodic structures; M=3,V —0, RH —2, EL: S(n,t+ 1) 3 S(n, t)+ 
S(n — 1, t)S(n + 2, t). 


Deep and extensive computer investigations have 
been exploited for unidimensional CAs with small 
values of M, N. Surprisingly enough, it seems that 
the typical behavior of all these CAs can be (roughly 
and heuristically) classified in just four classes 
(Wolfram 2002): 


e Class 1 (simple): possibly after a complicated 
transient, simple patterns emerge. 

e Class 2 (fractalic): possibly after a transient, 
overall regular nested structures are obtained. 

e Class 3 (chaotic): complicated but seemingly 
random behavior. 

e Class 4 (complex): possibly after a transient, 
localized structures emerge that interact in com- 
plex ways. 


Due to the looseness of the above definitions, 
perhaps a better way to distinguish between classes 
is to train one's eye. Consider some examples of 
CAs for each class: the typical behavior of class-1 
CA is shown in Figures 5 and 6, of class-2 CA in 
Figures 7 and 8, of class-3 CA in Figures 4 and 9, 
of class-4 CA in Figures 10 and 11. Note, however, 
that often one has “mixed type" CA: for example, 
CA4 is of class 1 on the right and of class 2 on 
the left (see Figure 3); Figure 12 exhibits a CA 
where the typical behaviors of classes 2 and 3 are 
superimposed. 


Extensions 


The concept of a CA is so simple that many 
extensions of the above-sketched definition of a 
CA can be easily devised. A (nonexhaustive) survey 
of such extensions follows. 
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a 


Figure 6 A class-1 CA, a random ID vanishes after 337 
time steps, M=5,V=0,R=2,EL: S(n,t - 1) S(n — 1,1) 
S(n—2,t)-- S(n--1.t)S(n--2,t) + S(n—-1,t) S(n--1,t) - S(n—2,t) 
S(n+2,t). 


- 
^ " 


Figure 7 A class-2 CA: Sierpinsky triangles appear; M — 2, 
V —0, R—1,EL: S(n, t - 1)  S(n — 1,t) + S(n +1, t). 


Vector CA 


In this extension, the state function S(z,t) is 
considered as a “vector,” namely S(n,t)= 
(S1(1, t), $2(n, t), ... Sy (n, 1), L being a positive inte- 
ger. Each component Sj(»,t)(|— 1,2,..., L) takes 
values in a finite field, say Zm, = (0,1, 2,..., M; 一 
1], and evolves, according to some EL, interacting 
with the other components. Of course, one can give 
separately the time evolution for each component; 
however, it is also possible to give a global 
representation of a vector CA, introducing a global 


3 bob ams. Ba. 65m of a T: 
Figure 8 Aclass-2 CA: a double fractal structure appears; M =4, 
V —0, R—2, EL: S(n, t-- 1)  S(n — 2,1) + S(n, t) + S(n 2, t). 


Figure 9 A class-3 CA: M=5,V=0,R=2,EL: S(n,t+ 1) 2 
2S(n — 1,t) + S(n + 1, t) + S(n, t)(S(n + 1, f) + S(n + 2, t)) + 
S(n — 1, t)S(n + 1, t). 


function $(n,t) that takes values in the finite field 
Zm, M =MıM2 ... Mz; for example, 


L-1 L 


S(n,t) = S(n,t) 4-5 | Si(n,t) | M, [7] 
i=l 


k>l 
Thus, in a sense, vector CAs are still usual CAs 
with a complicated EL. 


Example 6 (CA6) A two-component vector CA: 


Si(n,t-- 1) E si(n,t)$i(n + 1,t) 
十 (MI — 1)S2(n — 1,t)S2(n,t) 十 cl [8] 


So(n,t +1) 9 S (n — 1, t)S2(n, t) 
+ S(n, t)S2 (n + 1,1) + ec» [9| 
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Figure 10 Aclass-4 CA (Wolfram CA 110): M=2, V=0, H— 1, 
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Figure 11 A class-4 CA. Note the interacting moving struc- 
tures on the left and on the right; note also the apparently 
chaotic behavior in the center; M=2,V =0,R=2,EL: S(n,t + 1)= 
S(n,t) + S(n + 1,t) + S(n — 1,t) S(n + 2,1). 


The global behavior of this CA can be expressed, 
for example, through the global state function 


S(n,t) = M2S;(n,t) + Sz (n, t) [10] 


imposed on a chaotic one; M=4, V —0, R=2, EL: S(n, t+ 1) 2 
S(n, t)(S(n — 2, t) + S(n + 2, t)) + S(n — 1, t)S(n + 1, t). 


Figure 13 Global behavior of the vector CAG. 


Obviously, Sc Z4 with M= MM. Figure 13 
represents the global behavior of this CA with 
My =2, Mo = 3,6 =1.6=1,V=0. 

Note that this CA can be considered as an 
extension of the celebrated quadratic map. 


Multidimensional CA 


Up to now we have considered CAs with finite number 
of cells (finite CAs) or with an infinite number of cells 
arranged on a line (unidimensional CAs). Now we 
consider CAs with cells arranged on a surface, 
usually a plane (bidimensional CAs), or on 3-space 
(tridimensional CAs), or even on a hyperspace (multi- 
dimensional CAs). In any case, if the number of cells 
is finite, the evolution of such CAs, according to an 
NEL, must end up to a final cycle: this is due to the 
finiteness of the “phase space” (thus, these CAs should 
be classified as class 1; however, note that, if the 
“phase space” is large enough, the dynamics of 
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such CAs can still be very rich). Usually, one 
considers an infinite number of cells tessellating 
the whole s-space, s—2,3,... (e.g., squares or 
hexagons on the plane, cubic cells in 3-space). The 
changes in the previous notation and definitions are 
plain: for example, for a bidimensional CA, the state 
function depends now on two discrete "space" 
variables (S(»i1,75,t),n1 € Z,n2 € Z); furthermore, 
there is a greater freedom in choosing a neighbor- 
hood of range R. Two most-used neighborhoods of 
range 1 are shown below: 


Neumann neighborhood 
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The most famous (and interesting) bidimensional 
CA is "Life", introduced by J H Conway, which is 
discussed next. 


Example 7 (CA “Life”; Moore-Conway neighbor- 
hood, V —0, M —2). A cell in the vacuum state 0 is 
called “dead”; a cell in the state 1 is called “alive.” 
The EL is as follows: 


(i) If a cell is dead at time t, it comes alive at time 
t--1 if and only if exactly three of its eight 
neighbors are alive at time t (reproduction). 

(ii) If a cell is alive at time ż, it dies at time t+ 1 if and 
only if fewer than two (loneliness) or more than 
three (overcrowding) neighbors are alive at time f£. 


Clearly, this is a totalistic NEL. Now considering 
the explicit form of o (see [6]): 


o(n1,12,t) = —S(m.,n»,t) 
1 1 
+ » > S(nı +k1,m2+k2,t) [12] 
ky=—-1k2.=-1 
the above EL can be simply expressed as: 
S(ni,n5,t + 1) = 63.6 + 62,¢S(m1, m, t) [13] 


where 63, is the Kroenecker symbol. 
Life is a class-4 CA; it exhibits a rich variety of 
interesting structures: stable structures, oscillators 


(periodic structures), gliders and ships (moving 
structures), emitters and absorbers (namely, struc- 
tures that, after a time period, reconstitute them- 
selves, but meanwhile they have emitted or adsorbed 
moving structures). These structures are essential to 
prove that Life can be used to construct a universal 
Turing machine (see below). One can get a rough 
idea of such “richness” from Figure 14. 

As in the previous case of vector CA, one could 
object that also multidimensional CAs are not true 
extensions of the unidimensional CAs. Indeed, since 
the whole set of cells is still a countable set, one 
could number the cells with just a discrete “space” 
variable (say n € Z ). For example, in the case of a 
square tessellation of the plane, we could enumerate 
the cells in the plane starting from the origin as 
follows: 


22 “一 一 


21 20 19 18 
—13 -12 -11 4 3 6 17 
—14 -9 -10 3 2 7 16 
=15 一 3-1 0 1 8 15 [14] 
=16 -7 -2 -3 10 9 14 
-17 -6 -3 —4 11 12 13 
—18 一 19 


Thus, any multidimensional CA could in principle 
be viewed as a unidimensional one. Of course, one 
has to pay a price for this: ISs and ELs that are 
simple for a multidimensional CA become cumber- 
some for its unidimensional version and vice versa. 


Higher Time Derivatives 


Up to now, we have considered CAs whose evolved 
state S(t + 1) depends only on the state S(t), namely 
the state of the CA itself at the previous time step. In 
other words the EL involves just the first (discrete) 
time derivative (1. CA). One can easily extend all the 
previous definitions to consider higher-order discrete 
time derivatives (K CA). Of course, the ID and the IS 
for such a CA involve the state of the CA at K 
subsequent time steps. 

An example of a unidimensional 2_CA is given 
below. 


Example 8 (CA7) M=3,V=0,R=1. The EL is: 


S(n,t +1)=S(n—1,t)+S(n,t—1)+S(n+1,t) [15] 


An example of the evolution of such a CA is given in 
Figure 15. 
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Figure 14 CA “Life”: (a) Time 0. Near the lower border, five 
stable structures (from the left to the right: a "block", a "boat", a 
"ship", a “loaf”, a "beehive"); near the left border three “blinkers” 
(period-2 oscillators); near the right corner, a symmetric structure 
that, in one time step, evolves into a “pulsar” (a period-3 
oscillator), on the left-up corner a "glider" (a moving structure); 
on the right-up corner a "medium weight spaceship" (another 
moving structure); in the center, a configuration that vanishes in a 
few time steps. (b) Time 1. The structures on the lower border are 
unchanged, the blinkers, the glider, and the space ship are in an 
intermediate state, on the right border, the pulsar starts to pulse. 
(c) Time 2. The three blinkers on the left border are again in their 
original configurations (periodic structure with period 2), the 
pulsar, the glider and the spaceship are in another intermediate 
state. (d) Time 3. The pulsar is in its second state, the glider and 
the spaceship in their third, the structure in the center is going to 
vanish. (e) Time 4. The pulsar has completed its pulsation (period- 
3 oscillator, see Figure 14b); the structure in the center has 
vanished, the glider and the spaceship have recovered their 
original configurations (see Figure 14a) but meanwhile they have 
moved of a cell in four time steps (14 of the highest velocity 
attainable by a moving structure in a CA of range 1). The glider is 
moving downward and to the right, the space ship in horizontal to 
the left. (f) Time 60. The space ship has almost completed its 
crossing, the glider has reached the center and it is in a collision 
route with the pulsar. 


It is plain that taking a suitable continuum limit 
of a K_CA one gets a partial differential equation of 
order K for the evolution. However, there are also 
special and interesting CAs, called "filter" CAs, 
that in a suitable continuum limit end up in integral 
evolution equations. For a filter unidimensional 
CA, the evolved state at the cell n, S(m,t 4- 1), 
depends also on the (already) evolved states of the 
cells on its left (or right): for example, an NEL of 
the type 


S(n,t + 1) E F(S(n + k;, t), S(n — b; t + 1) 
i= l2 saN, kiez 
j=1,2,...,N; keN [16] 


is still valid (computable). Extensions to K-CAs or 
vector CAs or multidimensional CA are plain. Very 
often filter CAs exhibit a class-4 behavior with 
particle-like structures moving and interacting in a 
complex way; see the following example and 
examples in the next section. 


Example 9 (CA8) M=2,V=0,R=2. The EL is: 


S(n,t +1) 5 S(n —1,t — 1)S(n — 2,t) 
+ S(n,t) + S(n + 1, t)S(n + 2, t) [17] 


An example of the evolution of such a CA is given 
in Figure 16. 


Invertible CA 


For most of the ELs there is a loss of information 
in the course of the evolution (see, e.g., Figures 5 
and 6). Indeed, different definitions of “CA 
entropy" have been introduced to measure the 
“randomness” in the behavior of a given CA. 
However, since CAs are important in physical 
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structures moving to the left and to the right and interacting in 
complex ways. 


modeling as well as in cryptography and data 
compression, there is great interest in a special 
subclass of CAs which are “invertible” (time 
reversible). Namely, for an “invertible” CA fol- 
lowing a given EL and starting from an arbitrary 
ID, there exists an “inverse” EL such that one 
can recover the ID from the evolved states. 
Invertible CAs can be easily devised in the case of 
K.CA (K > 1). For example, if K=2,3..., one can 
consider ELs of the form 


S(n,t—K +1) - F(Stn-- &,1—j)) [18a] 


im 1,2...) Bez 
ulcus e ical Mae [18b] 
和 dw KZ 
and F is an arbitrary polynomial function. 
It is then clear that the inverse EL reads 


~ ~ 
~ 


$(n.£4-1) 5 So £— K--1) 
€ (M — D)F(SQi-- Li j- K2) [19] 


Indeed, if an arbitrary ID evolves according to 
the EL [18], then applying the inverse EL [19] to K 
subsequent evolved states (taken in reversed order), 
eventually the original ID is recovered (in reversed 
order) (see the following example). 


Example 10 (CA9) 


The EL is: 


A 6 CA: M-2,V—0,R-1. 


S(n,t + 1) $S(n,t — 5) - S(n,t — 3) - S(n - 1,t — 2) 


+ S(n—1,t — 1) 
+ S(n,t — 2)S(n - 1,t — 2) 
+ S(n,t)S(n — 1,t) [20] 
The inverse EL, according to [19], reads 
(Figure 17) 
S(n,£ + 1) & S(n,£— 5) + S(m,£ — 1) --S(n + 1,2 — 2) 
Sta — 1,F— 3) 
S(n,# — 2)S(n + 1,1 — 2) 
+ S(n,t — 4)S(n — 1,1 — 4) [21] 


(b) 


Figure 17 CA9, a 6 CA: (a) a 50 time-step evolution from a 
peculiar ID; (b) a 50 time-step evolution of the inverse EL, starting 
from the last six configurations of Figure 17a (taken in inverse 
order); the ID of Figure 17a is recovered (in inverse order). 


Of course, more complicated invertible ELs can be 
devised. Invertible ELs can be also easily devised for 
“filter” CA, for example, if an NEL for a “filter” CA 
reads 


+ F(S(n + k;,t),S(n—kj,t+1)) [22] 


where k; and k are 
TE E TE E 
(polynomial) function, 
the inverse NEL reads 


e positive integers 
N) and F is an arbitrary 
then it is invertible and 


S(n,t + 1) “8(n,?) +(M — 1) 


x F(S(n--k;,i--1),$(n— kt) [23] 
Note that [22] is computable starting from 


n= —oo, whereas [23] is computable starting from 
n= +00. 


(al) 


(c) 
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Example 11 (CA10) A1.5-CA,M=2,V=0,R=3. 


The EL is: 

S(n,t +1) =S(n,t) + S(n — 3,t + 1)S(n — 2,t +1) 
+S(n+2,t)S(n + 3,1) 
+ S(m —2,t+1)S(n—1,t+1) 
+ S(n + 1, t)S(n + 2, t) [24] 


Note that this EL is of the form [22]; therefore, it 
is invertible (see Figure 18a). According to [23], the 
inverse EL reads: 


2. S(n,t) + S(n 4- 3, - 1)S(n - 2,1 - 1) 
+ S(n — 2, t)S(n — 3,1) 
+ S(n -2,£-4- 1)S(n + 1,£- 1) 
+ S(n — 1,2)$(n — 2,7) [25] 


S(n,t +1) 


This CA exhibits a very rich dynamics: any 
complex ID rapidly decays in a great variety of coherent 
particle-like structures, steady or moving to the right or 


(b) 
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Figure 18 CA10: (a) 230 time-step evolution, then the inverse EL is applied for 230 further time step in order to recover the initial 
configuration. (b) Collisions between different kinds of particle-like coherent moving structures. The last collision (on the right) is 
a solitonic one: the interaction produces just a phase shift, preserving number, shape, and velocities of the involved "particles." 
(c) "Particles" moving with different velocities and interacting in complex ways (solitonic collisions, particle creations and annihilations). 
(d) A particle goes through a nonhomogeneous medium and undergoes refraction by the medium itself. 


(d) 
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to the left with different velocities. The interactions 
between different particles may be solitonic (the 
particles emerge unchanged but shifted) or annihila- 
tion—creation phenomena can occur (see Figures 18a-d). 


Applications of CAs 


CAs as Universal Constructors and 
Turing Machines 


In the 1950s, von Neumann, who contributed to the 
development of the first computer (ENIAC), decided 
to work out a mathematical theory of automata. 
Such a theory was finalized to give an answer to the 
following question: is it possible to build an 
automaton such that it allows universal computa- 
tion (i.e., it embodies a universal Turing machine) 
and, moreover, it is able to build (in order of 
decreasing generality) 


1. an arbitrary automata (universal constructor); 

2. a copy of itself (self-reproducing); and 

3. an automaton that is itself a universal Turing 
machine (constructor)? 


The last question von Neumann had intention to 
address was if in the process of automata self- 
reproduction (if possible) a process of evolution 
could take place, that is, if a simpler automaton 
could generate a more complex one. 

In the beginning, the idea of von Neumann was to 
describe, using mathematical axioms, an automaton 
moving inside a warehouse and selecting various 
elementary spare parts (e.g., “muscles,” switches, rigid 
girders) and then assembling them into a new auto- 
maton. While this original idea was very realistic, it was 
also very difficult to pursue, so that von Neumann, 
following a suggestion by Ulam, decided to consider his 
questions in the more abstract framework of CAs. 

The particular CA he considered is an infinite 
square CA with 29 possible states. The transition rule 
is dependent upon the cell to update and its north, 
east, south, and west neighbor cell (the von Neumann 
neighborhood). Among the 29 possible states there is 
one state that is *quiescent" (the vacuum state). 

von Neumann proved the existence of a configura- 
tion of ~ 50000 cells immersed in a sea of quiescent 
states that embodies a universal Turing machine and 
that is a universal constructor. An infinite one- 
dimensional *tape" is used to store a description of 
the automaton to build. The universal constructor 
reads. the description on the tape, develops a 
"constructing arm" that builds the configuration 
described on the tape in an unoccupied part of the 
cellular space, makes a copy of the tape and finally 
attaches it to the newly built automaton and retracts 


the constructing arm. When on the tape, it stores a 
description of the universal constructor itself, then it 
self-reproduces. The total size of the self-reproducing 
automaton amounts to ^ 200 000 cells. (Some com- 
puter simulations of von Neumann self-reproducing 
automaton are available on the web.) 

Since von Neumann's CA is a very complex one, 
it led researchers to think that a CA able to simulate 
a universal Turing machine should also be quite 
complex. The perspective changed completely after 
the introduction of CA Life. Conway was looking 
for a simple CA with a possible rich dynamics; 
however, it was subsequently realized that Life was 
much more complicated that anyone could have 
thought. Finally, thanks to the development of faster 
computers that allowed visualization of the evolu- 
tion of quite large populations and through the 
contribution of a large number of researchers, it was 
proved that a universal Turing machine could be 
embedded in Life. 

The discovery that even a simple CA such as Life 
could incorporate a universal Turing machine led to 
the question whether it could be possible to build a 
universal Turing machine inside a simple one- 
dimensional CA. This is indeed the case: up to 
now, the simplest CA capable of universal computa- 
tion is the W110 CA (see Figure 10), as proved 
recently by Cook after a conjecture formulated by 
Wolfram in 1985. 


CAs for Computer Simulations 


One of the major applications of CAs is the 
computer simulation of various dynamical pro- 
cesses. Even if CAs were not invented for this 
purpose, they possess peculiarities that make them 
particularly suitable for this task. The main advan- 
tage of using a CA for a dynamical simulation is due 
to their completely discrete nature that allows exact 
simulations on a computer. Thus, any spurious 
effect due to rounding errors is ruled out. Another 
advantage is that the EL of a CA can be seen as a 
function between finite sets. For this reason, one can 
specify the EL through a “lookup table" (see [2]): 
then when running the simulations, the computer 
has only to access the table instead of computing the 
function every time, shortening considerably the 
computation time. Another great advantage of CAs 
in computer simulations is that, for their very nature 
(at least for local EL), they can be implemented on 
parallel machines. These two concepts are at the 
basis of dedicated computers for CAs simulations 
developed by Toffoli, Margolus, and co-workers 
(CAM series). The possibility to use efficiently 
parallel computers for CA simulation could prove 
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Figure 19 A CA that “computes” the 3n+ 1 Collatz-Ulam 
map. The ID for the CA is the initial number for the iterated map 
(binary notation, order 2°°°, randomly chosen, displayed on the 
left vertical axis). The CA, according to the Collatz conjecture, 
ends up to the final stable configuration (horizontal line on the 
right for the CA, 1—4 — 2 — 1 for the map). 


to be fundamental when computer speeds approach 
saturation. Moreover, CAs themselves can mimic 
parallel computations, see, for example, Figure 19, 
where a nonlocal CA *computes" very efficiently the 
celebrated Collatz-Ulam 37 十 1 map. 


CAs in Physics 


Since Newton, physics has been described through 
differential equations and continuous functions. 
However, such a mathematical description is not 
fit for simulation on a computer, and some 
discretizations must be considered. First, one has to 
discretize space and time passing from differential 
equations to (finite systems of) finite difference 
equations; second, one has to round off the values 
of the functions to store them in the memory of the 
computer. The main drawback of this procedure is 
that in chaotic systems such approximations can 
rapidly lead to great differences between the real 
and the simulated behavior. As already noticed, this 
problem does not appear in CA. Thus, one would 
like to use this good characteristic of CAs in physical 
modeling taking due account of the continuous 
nature of the physics involved. This requires atten- 
tion and ingenuity in constructing reliable CA 
models for physical processes. For example, this 
goal has been achieved in the so-called lattice gas 
automata (LGAs). 

LGAs are CA models for the microscopic 
dynamics of fluids and gases. The thermodynamic 
limit of these CAs yields the correct continuous 
functions for the macroscopic quantities (density, 
pressure, viscosity, etc.). 

The first step toward LGAs was the discovery that 
the HPP model developed in the 1970s by Hardy, 
Pomeau and De Pazzis was in fact a CA. The HPP 
model describes the behavior of a fluid (or a gas) in 
a plane. The configuration space is given by a 


bidimensional square lattice and the particles are 
described by arrows lying on the edges of the lattices 
and pointing to some vertex (see Figure 20a). 

The particles are assumed to be all identical and 
with the same velocity, and particles on the same 
edge with the same direction are not allowed 
(exclusion principle). The EL prescribes that parti- 
cles move with unitary velocity along the edges in 
the direction pointed by the arrow (free flight) 
unless there are exactly two particles on the edges 
conhected to a given vertex and they point in 
opposite directions (collision); in this case they are 
replaced by two arrows pointing outward on the 
previously empty edges (see Figure 20b). Clearly, 
the EL conserves the number and the momentum of 
the particles. 

The HPP model can be described algebraically. 
The admissible particle velocities are just 


cl 一 十 %， ca = +9, c3—-—X, c4—-—5 [26] 
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Figure 20 (a) An example of configuration for the HPP model. 


(b) Head on collisions and three particle collisions in the HPP 
model. 
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Accordingly, only four bits n;(x, t), j= 1,2, 3, 4, are 
required to denote the presence (1) or the absence 
(0) of a particle with velocity c; pointing vertex x at 
time t. The dynamical rule for HPP can be written in 
the form 


n;(x 4- cj, t+ 1) 2 n;(x, t) + w;(x,t) [27] 


where term n;(x,t) on the right-hand side accounts 
for the free flight of particles, while w;(x, t) modifies 
the trajectories in the case of collisions. The w; are 
determined by the state of the system according to 
the following rules: 


wy = —n(1-— n2)n3(1 — n4) 

+ (1—)m(1-— n3)n4 [28a] 
Ww = —m (1 = n3)n4(1 = ni) 

+ (1 — m2)n3(1 — n4)ni [28b] 
us = —n3(1 — n4)nı (1 — n2) 

十 (1 一 n3)n4(1 一 nı )m [28c] 
w4 = —n4( 1 一 n)m(1 — n3) 

T (1 = na)ni(1 = na )773 [28d] 


It is plain that eqns [27] and [28] can be 
interpreted as the EL for a CA. 

In the thermodynamic limit, the equations govern- 
ing the dynamics of the macroscopic quantities of 
the fluid are given by the continuity equation and by 
anisotropic Navier-Stokes equations. The aniso- 
tropy in the Navier-Stokes equations is due to the 
fact that the invariance group of the square lattice is 
too small. This problem was solved by Frisch, 
Hasslacher, and Pomeau in 1986, with the introduc- 
tion of the FPP model. It turns out that a hexagonal 
lattice has enough symmetries to recover the 
isotropic Navier-Stokes equations in the thermo- 
dynamic limit. So, the FPP model is an example of a 
model where even if the microscopic dynamics is 
almost a caricature of the real dynamics, the 
thermodynamic limit gives rise to the correct 
physical equations. 

CAs have been used to simulate many other 
physical processes (unfortunately, there is no space 
here for a sufficiently elaborate description). The 
principal fields of application are: percolation 
theory, magnetism, diffusion phenomena, sandpiles, 
models of earthquakes, crystal growth, etc. 

The more intriguing aspect of some even simple CAs 
(e.g., CA9, CA10: see Figures 16 and 18) is their very 
rich particle-like dynamics. For instance, the existence 
of solitonic collisions suggested that the techniques 
recently developed to find and treat “integrable” 


nonlinear dynamical systems (nonlinear continuous 
and discrete evolution equations, many-body pro- 
blems) could profitably be extended to find “integr- 
able” CAs. Indeed, many such CAs have been found 
that exhibit “solitons” and are endowed with non- 
trivial conservation laws (of course, this is very 
important in physical modeling). Moreover, the 
above-cited similarity between certain CA behaviors 
and elementary particle physics phenomena suggests 
that the fundamental structure of reality (at the Planck 
level) could indeed be that of a CA (cells of Plank 
length, discrete time flow): attempts to construct this 
underlying CA physics have been pursued. 


Other Applications 


CAs exhibit a great plasticity, which makes them 
well suited to model systems in a wide range of 
fields. This is mainly due to the fact that CAs with 
very simple rules can also simulate universal Turing 
machines, so that they can exhibit a very rich and 
complicated overall dynamics (in principle, one 
could simulate any dynamical system using a simple 
CA). There is another reason for the wide applic- 
ability of CA modeling even outside of physics: 
namely, it is well known that algorithms, not 
differential equations, are better instruments to 
schematize dynamical processes for complex and 
organized systems. Since simple algorithms can be 
naturally implemented on CAs, the latter are very 
useful for realizing simple models and simulations in 
many fields: biology, economics, ecology, neural 
networks, traffic models, etc. 

Moreover, applications of CAs in informatics and 
specifically in cryptography and data compression 
have been investigated. 


See also: Dynamical Systems in Mathematical Physics: 
An Illustration from Water Waves; Generic Properties of 
Dynamical Systems; Integrable Systems: Overview. 
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Introduction 


We consider differentiable dynamical systems gen- 
erated by a diffeomorphism or a vector field on a 
manifold. We restrict to the finite-dimensional case, 
although some of the ideas can also be developed in 
the general case (Vanderbauwhede and Iooss 1992). 
We also restrict to the behavior near a stationary 
point or a periodic orbit of a flow. 

Let the origin 0 of R" be a stationary point of a C! 
vector field X, that is, X(0) 20. We consider the 
linear approximation A—dX(0) of X at 0 and its 
spectrum o(A), which we decompose as o(A) =o; U 
Oc U Gu, where o; resp. o, resp. ow consists of those 
eigenvalues with real part <0 resp. = O resp. >0. If 
o-= then there is no central manifold, and the 
stationary point 0 is called hyperbolic. Let Es, Ee, 
and E, be the linear A-invariant subspaces corre- 
sponding to o, resp. o, resp. Cu. Then R” = E; ® 
E.G E,. We look for corresponding X-invariant 
manifolds in the neighborhood of 0, in the form of 
graphs of maps. More precisely: 


Theorem 1 Let tbe vector field X above be of class 
C" (1 € r € oo). There exist map germs $,,:(E,,0) 一 
E; O Ej; Qe: (E; 6 Ez, 0) — Eus $4, :(E,,0) = E, Ec; 
Peu: (Ec® Eu, 0)—> Es, and $::(E,,0) — Es € E, of 
class C' such that the graphs of these maps are 
invariant for the flow of X. Moreover, these maps 
are of class C’, and their linear approximation at 0 
is zero, that is, their graphs are tangent to, 
respectively, E,,E, ® Ec, Eu, Ec  E,, and Ee. If X is 
of class C* then ġss and $,, are also of class C. If 
X is analytic then bss and by, are also analytic. 


The graph of de is called the (local) central 
(or, center) manifold of X at 0 and it is often 
denoted by W°. Thus, it is an invariant manifold 
of X tangent at the generalized eigenspace of 
dX(0) corresponding to the eigenvalues having zero 
real part. 
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von Neumann J (1966) In: Burks AW (ed.) Theory of Self- 
Reproducing Automata. Urbana: University of Illinois Press. 

Wolfram S (2002) A New Kind of Science. Champaign: Wolfram 
Media. 


(Non) uniqueness, Smoothness 


Most proofs in the literature (Vanderbauwhede 
1989) use a cutoff in order to construct globally 
defined objects, and then obtain the invariant graph 
as the solution of some fixed-point problem of a 
contraction in an appropriate function space. 
Although this solution is unique for the globalized 
problem, this is not the case at the germ level: 
another cutoff may produce a different germ of 
a central manifold. In other words, locally a 
central manifold might not be unique, as is 
easily seen on the planar example x?^8/Ox 一 
yO/Oy. On the other hand, the oc-jet of the map 
c, in case of a C™ vector field, is unique, so if 
there would exist an analytic central manifold then 
this last one is unique; in the foregoing example, 
it is the x-axis. But for the (polynomial) example 
(x — y*)0/Ox + y*0/Oy one can calculate that the 
c-jet of x = dc(y) is given by jacbely) = 32,44 my", 
which has a vanishing radius of convergence, so 
there is no analytic central manifold. On the other 
hand, by the Borel theorem we can choose a 
C*-representative for ġe. This can be generalized 
in the planar case: 


Proposition 1 If n—2 and if X is C® and if tbe 
oo-jet of X in the direction of the central manifold 
is nonzero, then this central manifold is C". 
In particular, if X is analytic tben tbe central 
manifold is either an analytic curve of stationary 
points or is a C* curve along which X has a 
nonzero jet. 


For proofs and additional reading, the reader is 
referred to Aulbach (1992). In general, a central 
manifold is not necessarily C™ (van Strien 1979, 
Arrowsmith and Place 1990): for the system in 
R? given by 


ð O 0 
à 30 2. 249 (0 
(x wat tx di" De 
one can find a C* central manifold for every k but 
there is no C% central manifold. Indeed, in this case 
the domain of definition of œ; shrinks to zero when 
k tends to infinity. 
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Central Manifold Reduction 


The importance of a central manifold lies in the 
principle of central manifold reduction, which 
roughly says that for local bifurcation phenomena 
it is enough to study the behavior on the central 
manifold, that is, if two vector fields, restricted to 
their central manifolds, have homeomorphic integral 
curve portraits, and if the dimensions of E, and E, 
are equal, then the two vector fields have home- 
omorphic integral curve portraits in R”, at least 
locally near 0. Let us be more precise: 


Theorem 2 Let m be the dimension of E,. There 
exists p, 0X p Xn— m, such that X is locally 
C?-conjugate to 


11 Ozi 

mp a n ð 
MC D 
i=m+1 Oz; i=m+p+1 Oz; 


where (z1,...,2,4) is a coordinate system on a 
central manifold, (z1,...,%n) is a coordinate system 
on R” extending (z1,...5%m) and 5, X;0/0z; 
is the restriction of X to a central manifold. 
Moreover, if 


= Oz; 

m+p a n 9 
和 
i=m+1 Oz; i=m+p+1 Oz; 


and if Y; ,YiO/Oz; is C°-equivalent (resp. C9- 
conjugate) to S... , X;0/0z; then X is C°-equivalent 
(resp. -conjugate) to Y. 


For a proof and further reading (a generalization) 
see Palis and Takens (1977). 

In case that more smoothness than just C" is 
needed, we have the principle of normal lineariza- 
tion along the central manifold. More concretely, let 
x denote a coordinate in the central manifold and 
let y be a complementary variable, that is, let 
X = X,0/0x + XyO/Oy. We define the normally 
linear part along the central manifold by 

0 OX 0 

NX := Xx, 0) 5-7, ^ 050) Y 
Under certain nonresonance conditions (Takens 
1971, Bonckaert 1997) on the real parts of the 
eigenvalues of dX(0), there exists a C” local 
conjugacy between X and NX for each reN 
(assuming X to be of class C*). If there are 
resonances, then one can conjugate with the 


so-called seminormal or renormal form containing 
higher-order terms (see Bonckaert (1997, 2000) and 
references therein; here one can also find results for 
cases where extra constraints should be respected, 
like symmetry, reversibility, or invariance of some 
given foliation etc.). 


Parameters 


Having an eigenvalue with zero real part is 
ungeneric, so in bifurcation problems we consider 
p-parameter families X, near, say, A—0. With 
respect to the results above, we remark that such a 
family can be considered as a vector field near 
(0,0) € R" x R? tangent to the leaves R" x {A}. In 
fact, the parameter direction R^ is contained in E,. 
In all the results mentioned, this structure “of being 
a family" is respected. For example, in Theorem 2 
we replace X;(z1,..., £5) by Xi(z1,..., 29, A). Hence, 
if X, is a versal unfolding of Xo then X, is a versal 
unfolding of Xo. By this, the search for versal 
unfoldings is reduced to the unfolding of singula- 
rities whose linear approximation at 0 has a purely 
imaginary spectrum. 


Diffeomorphisms, Periodic Orbits 


A completely analogous theory can be developed for 
fixed points of diffeomorphisms f:(R",0) — R”. 
Here we split up the spectrum of the linear part 
L — df(0) at 0 as o(L)=0, Uc; U Ow, where o; resp. 
c. resp. o, consists of those eigenvalues with 
modulus «1 resp. — 1 resp. >1. This theory can be 
applied to the time-7 map of a vector field (and will 
give the same invariant manifolds) and to the 
Poincaré map of a transversal section of a periodic 
orbit of a vector field (Chow et al. 1994). 


Normal Forms 


The general idea of a normal form is to put a 
(complicated) system into a form “as simple as 
possible" by means of a change of coordinates. This 
idea was already developed to a great extent by 
H Poincaré. Simple examples are: (1) putting a square 
matrix into Jordan form, (2) the flow box theorem 
(Arrowsmith and Place 1990) near a nonsingular 
point. Depending on the context and on the purpose 
of the simplification, this concept may vary greatly. It 
depends on the kind of changes of coordinates that are 
tolerated (linear, polynomial, formal series, smooth, 
analytic) and on the possible structures that must be 
preserved (e.g., symplectic, volume-preserving, sym- 
metric, reversible etc.). Let us restrict to local normal 
forms, that is, in the vicinity of a stationary point of a 
vector field or a diffeomorphism (the latter can be 


applied to the Poincaré map of a periodic orbit). We 
concentrate on the simplification of the Taylor series. 
The general idea is to apply consecutive polynomial 
changes of variables; at each step we simplify terms of 
a degree higher than in the step before. The ideal 
simplification would be to put all higher-order terms 
to zero, which would (at least at the level of formal 
series) linearize the system. But as soon as there are 
resonances (see below), this is impossible: the planar 
system 2x0/O0x + (y + x?)ð/ðy cannot be formally 
linearized. 


Setting 


Let X be a C"! vector field defined on a neighbor- 
hood of 0 € R”, and denote A—dX(0) (its linear 
approximation at 0). The Taylor expansion of X at 
0 takes the form 


X(x) 2A-x- Y^ X,(x) + O(|x|’*") 
k=2 


where X, € H*, the space of vector fields whose 
components are homogeneous polynomials of 
degree k. The classical formal normal-form theorem 
is as follows. We define the operator L4 on H* by 
putting Lab(x)= db(x) - A: x — A- h(x); one calls La 
the homological operator. One checks that 
LA(H*) c H*. One also denotes this by ad A(h)(x): 
see further in the Lie algebra setting. Let R^ be the 
range of La, that is, Rt = L4(H*). Let G* denote any 
complementary subspace to R^ in H*. The formal 


normal-form theorem states, under the above 
settings: 
Theorem 3 (Chow et al. 1994, Dumortier 1991) 


There exists a composition of near identity changes 
of variables of the form 


x =y + &(y) [1] 


where the components of &* are homogeneous 
polynomials of degree k, such that the vector field 
X is transformed into 


Y(y) 2 A-y - 3 gy) + Olly") 
k=2 


where gy € G*,k —2,...,r. 


Sometimes this theorem is applied to the restric- 
tion of a vector field to its central manifold, for 
reasons explained in the last section. This is the 
reason why we did not assume X to be C™; in the 
latter case one can let 7 一 oo and obtain a normal 
form on the level of formal Taylor series (also called 
oo-jets). Using a theorem of Borel, we infer the 
existence of a C* change of variables ó such that 
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the Taylor series of ¢,(X) is A: y+ Dr > ge(y). For 
practical computations, it is often appropriate to 
first simplify the linear part A and to diagonalize it 
whenever possible. Hence, it is convenient to use a 
complexified setting and to use complex polyno- 
mials or power series. One can show that all 
involved changes of variables preserve the property 
of “being a complex system coming from a real 
system,” that is, at the final stage we can return to a 
real system (see, e.g., Arrowsmith and Place (1990) 
for a more precise mathematical description). 
Hence, we can assume that A is an upper 
triangular matrix. Let the eigenvalues be \1,..., An. 
It can be calculated that the eigenvalues of L4, as an 
operator Ht 一 H*, are then the numbers (A, a) — X; 
where a € N”, 5 7 | oj—k and 1 <j € n. Hence, if 
these would all be nonzero then B* — H*, and then 
we have an ideal simplification, that is, all g, equal 
to zero. However, if such a number is zero, that is, 


(A,0) — 20 [2] 


it is called a resonance between the eigenvalues. In 
such a case, we have to choose a complementary 
space G^. From linear algebra it follows that one 
can always choose 


G* = ker(L,:) [3] 


where A* is the adjoint operator. But this choice [3] is 
not unique and is, from the computational point of 
view, not always optimal, especially if there are 
nilpotent blocks. This fact has been exploited by 
many authors. A typical example is the case where 
A — yO/Ox. On the other hand, if A is semisimple we 
can choose the complementary space to be ker(L4), so 
Lagkg =0; we can assume it to be the (complex) 
diagonal[A1,..., A,]. In that case we can be more 
explicit as follows. Let e; = 0/0x; denote the standard 
basis on C". For a monomial one can calculate that 


La(x"e) = ((A, a) — A)x"ej [4] 


If the latter is zero, then the monomial is called 
resonant. This implies that the normal form can be 
chosen so that it only contains resonant monomials. 

Putting a system into normal form not only 
simplifies the original system, it also gives more 
geometric insight on the Taylor series. To be more 
precise, suppose (for simplicity, this can be general- 
ized (Dumortier 1997)) that A is semisimple. One 
can calculate that the condition Lag, =0 implies: 
exp (—At)g,( exp (At)x)=g,(x) for all t € R. This 
means that g, is invariant for the one-parameter 
group exp(At). A typical example in the plane 
is: A has eigenvalues iA, —1A. Note that the (only) 
resonances are ((iA, —1A), (p + 1,p)) — 14 —0 and 
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(iA, —1A), (Dp -- 1)) +iA=0 for all peN. We 
suppose that the original system was real, that is, 
on R?; we can choose linear coordinates such that 
for z-—x-Fiyz-—x-iy the linear part is 
A =diagonal[iA, —iA]. Applying the remarks above, 
we conclude that the normal form only contains the 
monomials (zz)'z0/Oz and (zz)’z0/0z. The geo- 
metric interpretation here is that these monomials 
are invariant for rotations around (0,0). This can 
also be seen on the real variant of this: the Taylor 
series of the (real) normalized system has the 
form (A+ f(x^ + »^)(x8/8y — y8/8x) + glx? + y") 
(xO/Ox + yO/Oy) and is invariant for rotations. 
Warning: the dynamic behavior of a formal normal 
form in the central manifold can be very different 
from that of the original vector field, since we are 
only looking at the formal level. A trivial example is 
(take f —g—O in the foregoing example) X(x, y) = 
A(xOy — yOx) — exp(-1/(x?))89/ÓOx, where orbits 
near (0,0) spiral to (0,0), whereas the normal form 
is just a linear rotation. This difference is due to the 
so-called flat terms, that is, the difference between 
the transformed vector field and a C*-realization of 
its normalized Taylor series (or polynomial). In case 
of analyticity of X, one can ask for analyticity of the 
normalizing transformation $. Generically, this is 
not the case in many situations. The precise meaning 
of this “genericity condition" is too elaborate to 
explain in this brief review article. We provide some 
suggestions for further reading in the next section. 
One could roughly say that, in the central manifold, 
the normal form has too much symmetry and is too 
poor to model more complicated dynamics of the 
system, which can be “hidden in the flat terms.” To 
quote Il'yashenko (1981): *In the theory of normal 
forms of analytic differential equations, divergence 
is the rule and convergence the exception ...." 

In many applications, we want to preserve some 
extra structure, such as a symplectic structure, a 
volume form, some symmetry, reversibility, some 
projection etc.; the case of a projection is important 
since it includes vector fields depending on a para- 
meter. Sometimes a superposition of these structures 
appears (e.g., a family of volume-preserving systems). 
We would like that the normal-form procedure 
respects this structure at each step. One can often 
formulate this in terms of vector fields belonging to 
some Lie subalgebra £o. The idea is then to use 
changes of variables like [1], where £ is then generated 
by a vector field in £o. This will guarantee that all 
changes of variables are *compatible" with the extra 
structure. Unlike the general case where we could 
work with monomials as in [4], we will have to 
consider vector fields 5, in Co whose components are 
homogeneous polynomials of degree k. If this can be 


done, one says that Co respects the grading by the 
homogeneous polynomials. In order to fix ideas, 
suppose that Co are the divergence-free planar vector 
fields. Note that a monomial x'y/O/Ox is not diver- 
gence free. We can instead use time mappings of 
homogeneous vector fields of the form a(g+ 
1)x^*13449/0x — alp + 1) x y4*!9/Oy. Up to terms 
of higher order we can use the time-one map of h, 
instead of x + h(x). In case that one asks for a C*- 
realization of the normalizing transformation, we need 
an extra assumption on the extra structure, that is, on 
Lo, called the Borel property: denote by /... the set of 
formal series such that each truncation is the Taylor 
polynomial of an element of £5. The extra assumption 
is: each element of Jo must be the Taylor series of a 
C* vector field in Zo. It can be proved (Broer 1981) 
that the following structures respect the grading and 
satisfy the Borel property: being an r-parameter family, 
respecting a volume form on R", being a Hamiltonian 
vector field (n even), and being reversible for a linear 
involution. 

One could consider other types of grading of the 
Lie-algebras involved. 

This method, using the framework of the so-called 
fileered Lie algebras, is explained and developed 
systematically in a more general and abstract 
context in Broer (1981). 

In nonlocal bifurcations, such as near a homo- 
clinic loop, for example, it is not enough to perform 
central manifold reduction near the singularity: a 
simplified smooth model in a full neighborhood of 
the singularity is often needed, for example, in order 
to compute Poincaré maps. 

Let us start with the “purely” hyperbolic case (i.e., 
dim E, —0). First we compute the formal normal 
form such as the above. If there are no resonances 
[2] then we can formally linearize the vector field X. 
If X is C* then a classical theorem of Sternberg 
(1958) states that this linearization can be realized 
by a C* change of variables (i.e., no more flat terms 
remaining). In case there are resonances, we must 
allow nonlinear terms: the resonant monomials. In 
this case we can also reduce C* to this normal form. 
Using the same methods, it is also possible to reduce 
to a polynomial normal form, but this time using 
C*(k < oc) changes of variables. More precisely, if k 
is a given number and if we write the vector field as 
X — Xy 4 Ry, where Xy is the Taylor polynomial 
up to order N (which can be assumed to be in 
normal form) and where RN(x) = O(|x|Nt! ), then for 
N sufficiently large there is a C* change of variables 
conjugating X to Xy near 0. The number N depends 
on the spectrum of A=dxX(0). An elegant proof of 
these facts can be found in Il'yashenko and Yakovenko 
(1991). For the case when extra structure must be 
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preserved, see Bonckaert (1997), which also deals with 
the partially hyperbolic case (dim E, > 1). As already 
remarked above, the case of a parameter-dependent 
family can be regarded as a partially hyperbolic 
stationary point preserving this extra structure. 

The question of an analytic normal form, also in 
the hyperbolic case, leads to convergence questions 
and calls upon the so-called small-divisor problems. 
The classical results are due to Poincaré and Siegel. 
Let us summarize them; they are formulated in the 
complex analytic setting: 


Theorem 4 


(1) If tbe convex bull of tbe spectrum of A does not 
contain 0 € C then X can locally be put into 
normal form by an analytic change of variables. 
Moreover, this normal form is polynomial. 

(ii) If the spectrum {Mi,..., Àn} of A satisfies the 
condition that there exists C > 0 and u > 0 such 
that for any m € N” with Y, mj > 2: 


C 
»An),m) 一 为 | 之 mr [5] 


KAn. 
for 1 € j € n then X can be locally linearized by 
an analytic change of variables. 


Note that case (i) contains the case where 0 is a 
hyperbolic source or sink. This case (i) in Theorem 4 
can be extended if there are parameters: if X 
depends analytically on a parameter e € C? near 
€ — 0 then the change of variables is also analytic in 
€; moreover, the normal form is then a polynomial 
in the space variables whose coefficients are analy- 
tically dependent on the parameter £. 

For case (ii) this is surely not the case, since the 
condition [5] is fragile: a small distortion of the 
parameter generically causes resonances, be it of a 
high order. To fix ideas, consider n= 2 and suppose 
Ay «0 « A». By a generic but arbitrary small 
perturbation, we can have that the ratio of these 
eigenvalues becomes a negative rational number 
—p/q, which gives a resonance of the form [2] 
with / — 1 and o — (q + 1, p), so [5] is violated. 

So analytic linearization, or even a polynomial 
analytic normal form, is ungeneric for families of 
such hyperbolic stationary points. The search for 
analytic normal forms, that is, simplified models, for 
families is still under investigation. A first simplifica- 
tion is obtained via the stable and unstable manifold 
from Theorem 1, that is, the graphs of ó, and ó,,. 
When X is analytic near 0 then these manifolds are 
also analytic. So, up to an analytic change of variables, 
we can assume that E, and E, are invariant, which 
gives a simplification of the expression of X. More- 
over, there is analytic dependence on parameters. 


For local diffeomorphisms there are completely 
similar theorems pertaining to all the cases consid- 
ered above. 


Concluding Remarks 


The concept of central manifold can be extended to 
more general invariant sets (see Chow et al. (2000) 
and references therein). It can also be extended to 
the. infinite-dimensional case and can be applied to 
partial differential equations (Vanderbauwhede and 
looss 1992). 

Concerning the generic divergence of normalizing 
transformations, the reader is referred to Broer and 
Takens (1989), Bruno (1989), Il'yashenko (1981), and 
Il'yashenko and Pyartli (1991). Although the power 
series giving the normalizing transformation generally 
diverges, the study of the dynamics is often performed 
by truncating the normal form at a certain order. 
Recently, Iooss and Lombardi (2005) considered the 
question as to what an optimal truncation is. It is 
shown, in case dX(0) is semisimple, that the order of 
the normal form can be optimized so that the remainder 
satisfies some estimate shrinking exponentially fast to 
zero as a function of the radius of the domain. 

Concerning normal forms preserving the 
Hamiltonian structure, see Birkhoff (1966) and 
Siegel and Moser (1995) for a starting point; this is 
an extended subject on its own, sometimes called 
Birkhoff normal form, and it would require another 
review article. 

Further simplifications of the normal form can 
sometimes be obtained by taking into account 
nonlinear terms (instead of just A) in order to obtain 
reductions of higher-order terms (see Gaeta (2002) 
and especially the references therein). 

Applications of normal forms and central mani- 
folds to bifurcation theory have been explained in 
Dumortier (1991). 


See also: Averaging Methods; Bifurcation Theory; 
Dynamical Systems and Thermodynamics; Dynamical 
Systems in Mathematical Physics: An Illustration from 
Water Waves; Finite Group Symmetry Breaking; 
Korteweg-de Vries Equation and Other Modulation 
Equations; Multiscale Approaches; Normal Forms and 
Semiclassical Approximation; Symmetry and Symmetry 
Breaking in Dynamical Systems. 
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introduction 


Consider a typical quantum system such as a string 
of ions in a trap. To “process” the quantum 
information the ions carry, we have to perform in 
general many steps of a quite different nature. 
Typical examples are: free time evolution (including 
unwanted but unavoidable interactions with the 
environment), controlled time evolution (e.g., the 
application of a “quantum gate” in a quantum 
computer), preparations and measurements. Each 
processing step can be described by a channel which 
transforms input systems into output system of a 
possibly different type (e.g., a measurement trans- 
forms quantum systems into classical information). 


Systems, States, and Algebras 


To get a unified mathematical description of systems 
of different physical nature, it is useful to consider 


C*-algebras (which are, in our case, always finite 
dimensional): quantum systems can be represented 
in terms of the algebra B(H) of (bounded) operators 
on the Hilbert space H= C^; for classical informa- 
tion we have to choose the set C(X) of (continuous), 
complex-valued functions on the finite alphabet X; 
and the tensor product of both B(H) @C(X) 
describes hybrid systems which are half-classical 
and half-quantum. Assume now that A is one of 
these algebras. Effects (i.e., yes/no measurements on 
the system in question) are then described by A € A 
satisfying 0 < A < 1, states are positive, normalized 
linear functionals w: A — C, and the probability to 
get the result “yes” during an A measurement on a 
system in the state w is given by u(A). Since A is 
assumed to be finite dimensional, each state w on 
B(H) is represented by a density operator p, that is, 
w(A)=tr(pA). Likewise, a state w on C(X) has the 
form w(A)= M, A(x)px, where (px)vex denotes a 
probability distribution on X, and a state w on 
B(H) & C(X) is described by a sequence (p.),-x of 
positive (trace-class) operators on B(H) with 
ye (px) —=1 such that a{A)= 5^, tr(p,A«). Here 


we have used the fact that an element A € B(H) & 
C(X) can be represented in a canonical way by a 
sequence (Ax),-x of operators on H. The set of 
states will be denoted in the following by 5S(.A) and 
the set of effects by £(.A). 


Completely Positive Maps 


Our aim is now to get a mathematical object which 
can be used to describe a channel. To this end, 
consider two C*-algebras, .A, B, describing the input 
and output system, respectively, and an effect A € B 
of the output system. If we invoke first a channel 
which transforms A systems into B systems, and 
measure A afterwards on the output systems, we end 
up with a measurement of an effect T(A) on the 
input systems. Hence, we get a map T: €(B) — £(.A) 
which completely describes the channel (note that 
the direction of the mapping arrow is reversed 
compared to the natural ordering of processing). 
Alternatively, we can look at the states and interpret 
a channel as a map T*:S(.A) — S(B) which trans- 
forms .A systems in the state p € S(.A) into B systems 
in the state T*(p). To distinguish between both 
maps, we can say that T describes the channel in the 
Heisenberg picture and T* in the Schrödinger 
picture. On the level of the statistical interpretation, 
both points of view should coincide of course, that 
is, the probabilities (T*p)(A) and p(TA) to get the 
result “yes” during an A measurement on B systems 
in the state T*p, respectively a TA measurement on 
A systems in the state p, should be the same. Since 
(T*p)(A) is linear in A, we see immediately that T 
must be an affine map, that is, T(A,;A1 + A2A2) = 
A T(A1) + Ar T(A5) for each convex linear combina- 
tion A;A; + 和 AA of effects in B, and this in turn 
implies that T can be extended naturally to a linear 
map, which we will identify in the following with 
the channel itself, that is, we say that T is the 
channel. 

Let us now change slightly our point of view and 
start with a linear operator T:.A — B. To be a 
channel, T must map effects to effects, that is, T has 
to be positive: T(A) > OVA > 0 and bounded from 
above by 1, that is, T(1) € 1. In addition, it is natural 
to require that two channels in parallel are again a 
channel. More precisely, if two channels T: A; — By 
and $: A — B2 are given, we can consider the map 
T & S which associates to each A & B € A; ® A the 
tensor product T(A) ® S(B)e B; & B». It is natural to 
assume that T&S is a channel which converts 
composite systems of type A; $.45» into Bı Q B; 
systems. Hence, 8 & T should be positive as well. 


Definition 1 Consider two observable algebras 
A, B and a linear map T :.A — B c B(H). 
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(i) T is called positive if T(A) » 0 holds for all 
positive A € .A. 

(ii) T is called completely positive (CP) if T & 
Id: A & B(C") — B(H) & B(C") is positive for 
all » € N. Here Id denotes the identity map 
on B(C"). 

(iii) T is called unital if T(1)= 1 holds. 


Consider now the map T* : B* — A* which is dual 
to T, that is, T*p(A) = p(TA) for all p € B* and A c A. 
It is. called the Schródinger-picture representation of 
the channel T, since it maps states to states provided T 
is unital. (Complete) positivity can be defined in the 
Schródinger picture as in the Heisenberg picture, and 
we immediately see that T is (completely) positive iff 
T* is. 

It is natural to ask whether the distinction 
between positivity and complete positivity is 
really necessary, that is, whether there are 
positive maps which are not CP. If at least one 
of the algebras A or B is classical, the answer is 
no: each positive map is CP in this case. If both 
algebras are quantum however, complete positiv- 
ity is not implied by positivity alone. The most 
prominent example for this fact is the transposi- 
tion map. 

If item (ii) holds only for a fixed nEN, 
the map T is called n-positive. This is obviously 
a weaker condition than complete positivity. 
However, z-positivity implies m-positivity for 
all m < n, and for 4= B(C^) complete positivity 
is implied by z-positivity, provided » > d holds. 

Let us consider now the question whether a 
channel should be unital or not. We have already 
mentioned that T(1) € 1 must hold since effects 
should be mapped to effects. If T(1) is not equal to 1, 
we get p(T1)=T*p(l)<1 for the probability to 
measure the effect 1 on systems in the state T*p, 
but this is impossible for channels which produce an 
output with certainty, because | is the effect which 
is always true. In other words, if a CP map is not 
unital, it describes a channel which sometimes 
produces no output at all and T(1) is the effect 
which measures whether we have got an output. We 
will assume henceforth that channels are unital if 
nothing else is explicitly stated. 


Quantum Channels 


In this section we will discuss some basic properties 
of CP maps which transform quantum systems into 
quantum systems, in particular the Stinespring 
theorem, which constitutes the most important 
structural result. For a more detailed presentation, 
including generalizations to more general input/ 
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output algebras the reader should consult the 
textbook by Paulsen (2002). 


The Stinespring Theorem 


Hence consider channels between quantum systems, 
ie, A=B(H;) and B=B(H2). A fairly simple 
example (not necessarily unital) is given in terms of 
an operator V:Hı > H5; by B(H,) 3 A VAV* € 
B(H2). A second example is the restriction to a 
subsystem, which is given in the Heisenberg picture 
by B(H) 5 A= A Q 1x € BIH & K). Finally the com- 
position So T—ST of two channels is again a 
channel. The following theorem says that each 
channel can be represented as a composition of 
these two examples [7]. 


Theorem 2 (Stinespring dilation theorem). Every 
completely positive map T : B(H4) — B(H2) has the 
form 


T(A) = V*(A ® 1)V [1] 


with an additional Hilbert space K and an operator 
V:H > H1 Q K. Both (ie., K and V) can be 
chosen such that the span of all (A & 1)Vó with A € 
B(Hi) and o € H» is dense in H,@K. This 
particular decomposition is unique (up to unitary 
equivalence) and is called the minimal 
decomposition. 


By introducing a family |x;)(x;| of one-dimen- 
sional projectors with 5, |xj) (xj| 2 1, we can define 
the “Kraus operators” (vw, Vio) — (v 8 xj, VQ). 
In terms of these, we can rewrite eqn [1] in 
the following form (Kraus 1983): 


Corollary 3 (Kraus form). Every CP map 
T : B(H4) > B(H2) can be written in tbe form 


T(A) = Y^ V; AV; [2] 
j=1 


with operators V;:Hı > Hı. 


To get a third representation of channels, consider 
the Stinespring form [1] of T and a vector v € K 
such that U(ó & v)-— V(o) can be extended to a 
unitary map U:H& K — H&K. It is then easy to 
see that the dual T* of T can be written as: 


Corollary 4 (Ancilla form). Assume that T : B(H) 一 
B(H) is a channel. Then there is a Hilbert space K, a 
pure state po, and a unitary map U:H@®K-HeK 
such that 


T*(p) = tre(U(p & po)U*) 3 
bolds. 


This representation of a channel has a (seemingly) 
very nice physical interpretation, because we can 
look at eqn [3] as the unitary interaction of the 
system with an unobservable environment, which is 
initially in the state po. The problem, however, is 
that there is a great arbitrariness in the choice of U 
and po. This is the weakness of the ancilla form 
compared to the Stinespring representation. 

Finally, let us state a related result. It characterizes 
all decompositions of a given completely positive 
map into completely positive summands. By analogy 
with results for states on abelian algebras (i.e., 
probability. measures), we will call it a Radon- 
Nikodym theorem (see Arveson (1969) for a proof). 


Theorem 5. (Radon-Nikodym theorem). Let 
Tx: B(Hy) — B(H2),x € X be a family of CP 
maps and let V:Hz — Hi QK be the Stinespring 
operator of T= Y, Ty; then there are uniquely 
determined positive operators F, in B(K) with 
S, Ex = 1 and 


T,(A) = V'(A 8 Fx) V [4] 


The Jamiotkowski Isomorphism 


The subject of this section is a relation between CP 
maps and states of bipartite systems, first discovered 
by Jamiolkowski (1972), and which is very useful in 
translating properties of bipartite systems into 
properties of positive maps and vice versa. 

The idea is based on the following setup. Alice 
and Bob share a bipartite system in a maximally 
entangled state 


1 d 
X = 一 一 Ca Dea EHIH 5 
7 i5] 


(where e1,...,e4 denote an orthonormal basis of H). 
Alice applies to her subsystem a channel T : B(H) 一 


B(H') while Bob does nothing. At the end of the 
processing, the overall system ends up in a state 


Rr = (T 8 1d)|x) (x [6] 


Mathematically, eqn [6] makes sense if T is only 
linear but not necessarily positive or CP (but then 
Rr is not positive either). If we denote the space of 
all linear maps from B(H) into B(H’) by £, we get a 
map 


C3TRr € B(K@H) [7] 


which is easily shown to be linear (i.e., 
Ruris— uRT-4- ARg for all XAEC and all 
T,S € £). Furthermore, this map is bijective, hence 
a linear isomorphism. 


Theorem 6 Tbe map defined in eqns [7] and [6] is 
a linear isomorpbism. The inverse map is given by 


BIH@H) > poT,€L [8] 


with 
(es T, (o) =dtr(pile,)(e,|@o")) 9] 


where &,,...,e€, € H' denote an (arbitrary) ortho- 
normal basis of 'H' and the transposition of o is 
defined with respect to the basis e,,a=1,...,d used 
to define x in [5]. 


From the definition of Rr in eqn [6], it is obvious 
that Rr is positive, if T is CP. To see that the 
converse is also true is not as trivial (because a 
transposition is involved), but it requires only a 
short calculation, which is omitted here. Hence, we 
get: 


Corollary 7 The operator Rr is positive, iff the 
map T is CP. 


Examples 


Let us return now to the general case (i.e., arbitrary 
input and output algebras) and discuss several 
examples. 


Channels Under Symmetry 


It is often useful to consider channels with special 
symmetry properties. To be more precise, consider 
a group G and two unitary representations 71, 72 
on the Hilbert spaces Hı and H2, respectively. 
A channel T:B(?1;) ^ B(H2) is called covariant 
(with respect to 7, and 72) if 


T[r1(U)Ami(U)'] = 72(U)T[A]z2(U)' 
VA € B(H1) VU EG 10] 


holds. The general structure of covariant channels 
is governed by a fairly powerful variant of Stine- 
springs theorem (Keyl and Werner 1999). 


Theorem 8 Let G be a group with finite-dimen- 
sional unitary representations 7;:G — U(H;) and 
T:B(Hi) — B(H3) a mı, m-covariant channel. 


(i) Then there is a finite-dimensional unitary 
representation *:G— U(K) and an operator 
V:H2 = Hy Q K with Vr2(U)=71(U) $9 F(U)V 
and T(A) - V'A&1V. 

(ii) If T= $5, T° is a decomposition of T in CP and 
covariant summands, there is a decomposition 
l= M. F* of the identity operator on K into 
positive operators F^ € B(K) with |F^,2(g)] - 0 
such that T^(X)— V*(X & F?)V. 
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The most prominent examples of covariant 
channels arise with 744—745; — C^, G— U(d) and 
71(U) — 7;(U) — U. All channels of this type are of 
the form 


T(A) = (1 — 9)A + ód  tr(A)1 
with 9 € [0, d^ /(d^ — 1)] [11] 


and are known as “depolarizing channels.” They 
often serve as a standard model for noise. Two 
particular cases are the ideal channel arising with 
y=0, and the completely depolarizing channel 
(0 — 1) which erases all information. If we choose 
73(U) — U (where the bar denotes complex conju- 
gate) instead of 7;(U) — U, we get 


T(A) = 11 [tr(A)1 + A‘) 
+ [tr(A)1— A], yel[0,1] [12] 


If we map these channels to states of bipartite 
systems (using the Jamiołkowski isomorphism from 
the last section), we get “Isotropic states" from 
eqn [11] and “Werner states" from [12]. 


Classical Channels 


The classical analog to a quantum operation is a 
channel T:C(X) — C(Y) which describes the trans- 
mission or manipulation of classical information. As 
already mentioned in the subsection “Completely 
positive maps," positivity and complete positivity 
are equivalent in this case. Hence, we have to 
assume only that T is positive and unital. Obviously, 
T is characterized by its matrix elements 
Txy = 6y(Tex), where 6, € C'(X) denotes the Dirac 
measure at y € Y and e, € C(X) is the canonical 
basis in C(X). More precisely, 6, and e, denote, 
respectively, the probability distribution and the 
function on X, given by 


by = (bxy)nex and ely) =y [13] 


We will keep this notation up to the end of this 
article. Positivity and normalization of T imply that 
0 < T,, € 1 and 


1 = 6,(1) = 6,(T(1)) 
"(x ex) = Ws [14] 


holds. Hence the family (Txy)xex is a probability 
distribution on X and Ty, is, therefore, the transition 
probability to get the information x € X at the 
output side of the channel if y € Y was sent. 


= dy 
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Observables 


Let us consider now a channel which transforms 
quantum information B(H) into classical information 
C(X). Since positivity and complete positivity are 
again equivalent, we just have to look at a positive 
and unital map E:C(X) — B(H). With the canonical 
basis e,x € X, of C(X), we get a family 
E, = E(e,), x € X, of positive operators E, € B(H) 
with .x E, — 1. Hence, the E, form a positive 
operator valued (POV) measure, i.e., an observable. 
If, on the other hand, a POV measure E, € B(H), x € 
X, is given, we can define a quantum-to-classical 
channel E:C(X) 一 B(H) by 


E(f) = > f(x)Ez [15] 


XEX 


This shows that the observable E,,x € X, and the 
channel E can be identified. 


Preparations 


Let us now exchange the role of C(X) and B(H); in 
other words, let us consider a channel R: B(H) 一 
C(X) with a classical input and a quantum output 
algebra. In the Schródinger picture, we get a family of 
density matrices px := R*(6,) € B'(H),x € X, where 
6, € C'(X) denotes again the Dirac measure on X. 
Hence, we get a parameter-dependent preparation 
that can be used to encode the classical information 
x € X into the quantum information p, € B*(H). 


Instruments 


An observable describes only the statistics of 
measuring results, but does not contain information 
about the state of the system after the measurement. 
To get a description which fills this gap, we have 
to consider channels which operate on quantum 
systems and produce hybrid systems as output, that is, 
T:B(H)&C(X) — B(K). Following Davies (1976), 
we will call such an object an instrument. From T we 
can derive the subchannel 


C(X) 3 f 5 T(1&f) € B(K) [16] 


which is the observable measured by T, that is, 
tr(T(1 & e,)p) is the probability to measure x € X on 
systems in the state p. On the other hand, we get for 
each x € X a quantum channel (which is not unital) 


B(H) > A= T,(A)= TASe,) eB(K) [17] 


It describes the operation performed by the instru- 
ment T if x € X was measured. More precisely, if a 
measurement on systems in the state p gives the 
result x € X, we get (up to normalization) the state 
T*(p) after the measurement, while 


tr(T*(p)) =tr(T2(p)1) =tr(pT(1@e,)) [18] 


is (again) the probability to measure x € X on p. 
The instrument T can be expressed in terms of the 
operations T, by 


T(A@f) = >》 /f(x)T.(A) [19] 


Hence, we can identify T with the family Ty, x € X. 
Finally, we can consider the second marginal of T 


B(H) 3 AH T(A@1)=)_ T.(A)eB(K) (20) 


xEX 


It describes the operation we get if the outcome of 
the measurement is ignored. 

The best-known example of an instrument is a von 
Neumann-Lüders measurement associated with a PV 
measure given by family of projections E,,x=1, 
...,d; for example, the eigenprojections of a self- 
adjoint operator A € B(). It is defined as the channel 


T : B(H) @C(X) > B(H) 
with X = {1,...,d} and T,(A) = ESAE, [21] 


Hence, we get the final state tr(E,p) ! E.pE, if we 
measure the value x € X on systems initially in the 
state p — this is well known from quantum mechanics. 


Parameter-Dependent Operations 


Let us change now the role of B(H)@C(X) and 
B(K); in other words, consider a channel T : B(X) 一 
B(H) & C(X) with hybrid input and quantum output. 
It describes a device which changes the state of a 
system depending on the additional classical infor- 
mation. As for an instrument, T decomposes into a 
family of (unital!) channels T,:B(KX) 一 B(H) such 
that we get T*(p ® p) = > px T, (p) in the Schródin- 
ger picture. Physically, T describes a parameter- 
dependent operation: depending on the classical 
information x € X, the quantum information p € 
B(K) is transformed by the operation Tx. 

Finally, we can consider a channel T: B(H) Q 
C(X) 一 B(K) @C(Y) with hybrid input and output 
to get a parameter-dependent instrument: similarly 
to the above discussion, we can define a family of 
instruments T,:B(H) & C(X) — B(K),y € Y, by the 
equation T'(p&p)— »;,pyT,(p) Physically, T 
describes the following device: it receives the 
classical information y € Y and a quantum system 
in the state p € B'(K) as input. Depending on y, a 
measurement with the instrument T, is performed, 
which in turn produces the measuring value x € X 
and leaves the quantum system in the state (up to 
normalization) T7 XU with Ty,» given as in eqn 
[17] by Ty,x(A)=T)(A & ex). 


See also: Capacities Enhanced by Entanglement; 
Capacity for Quantum Information; Entanglement; 
Optimal Cloning of Quantum States; Positive Maps on 
C*-Algebras; Quantum Channels: Classical Capacity; 
Quantum Dynamical Semigroups; Quantum Entropy; 
Quantum Spin Systems; Source Coding in Quantum 
Information Theory. 
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Introduction 


Chaos is a type of behavior that can be exhibited by 
a large class of physical systems and their mathe- 
matical models. These systems are deterministic. 
They are modeled by sets of coupled nonlinear 
ordinary differential equations (ODEs): 


& = = fe ] 


called dynamical systems. The coordinates x desig- 
nate points in a state space or phase space. 
Typically, x € R" or some n-dimensional manifold 
for some »>3, and cc R* are called control 
parameters. They describe parameters that can be 
controlled in physical systems, such as pumping 
rates in lasers or flow rates in chemical mixing 
reactions. The most important mathematical prop- 
erty of dynamical systems is the uniqueness theorem, 
which states that there is a unique trajectory through 
every point at which f(x;c) is continuous and 
Lipschitz and f(x;c) 4 0. In particular, two distinct 
periodic orbits cannot have any points in common. 

The properties of dynamical systems are gov- 
erned, in lowest order, by the number, stability, and 
distribution of their fixed points, defined by 
x;—f;(x;c)— 0. It can happen that a dynamical 
system has no stable fixed points and no stable 
limit cycles (x(t) 2 x(t +T), some T > 0, all t). In 
such cases, if the solution is bounded and recurrent 
but not periodic, it represents an unfamiliar type of 
attractor. If the system exhibits "sensitivity to initial 
conditions” — (|x(t) — y(t)| ~ eV|x(0) 一 y(0)| — for 
Ix(0) — y(0)| =€ and 入 >0 for most x(0), the 
solution set is called a “chaotic attractor.” If the 
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attractor has fractal structure, it is called a “strange 
attractor." 

Tools to study strange attractors have been 
developed that depend on three types of mathe- 
matics: geometry, dynamics, and topology. 

Geometric tools attempt to study the metric 
relations among points in a strange attractor. 
These include a spectrum of fractal dimensions. 
These real numbers are difficult to compute, require 
very long, very clean data sets, provide a number 
without error estimates for which there is no 
underlying statistical theory, and provide very little 
information about the attractor. 

Dynamical tools include estimation of Lyapunov 
exponents and a Lyapunov dimension. They include 
globally averaged exponents and local Lyapunov 
exponents. These are eigenvalues related to the 
different stretching (入 > 0) and squeezing (A < 0) 
eigendirections in the phase space. To each globally 
averaged Lyapunov exponent Àj; A; > A2 2 > Ay, 
there corresponds a “partial dimension” ej, 0 € e; < 1, 
with c;—1 if A; > 0. The Lyapunov dimension is 
the sum of the partial dimensions dj = $7 ,«ej. 
That the partial dimension e; — 1 for A; > 0 indicates 
that the flow is smooth in the stretching (A; > 0) and 
flow directions and fractal in the squeezing (A; « 0) 
directions with e; < 1. Dynamical indices provide 
some useful information about a strange attractor. 
In particular, they can be used to estimate some 
fractal properties of a strange attractor, but not vice 
versa. 

Topological tools are very powerful for a 
restricted class of dynamical systems. These are 
dynamical systems in three dimensions (n= 3). For 
such systems there are three Lyapunov exponents 
Ay > Az > A3, with A; > 0 describing the stretching 
direction and responsible for “sensitivity to initial 
conditions,” Az =0 describing the direction of the 
flow, and A; < 0 describing the squeezing direction 
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and responsible for *recurrence." Strange attractors 
are generated by dissipative dynamical systems, 
which satisfy the additional condition Al 十 A» + 
A3 «0. For such attractors, €;=e=1 and 
€3 = À1/|A3| by the Kaplan-Yorke conjecture, so 
that dj =2+ e; =2 + A;/|A3]. 

A number of tools from classical topology have 
been exploited to probe the structure of strange 
attractors in three dimensions. These include the 
Gauss linking number, the Euler characteristic, the 
Poincaré-Hopf index theorem, and braid theory. 
More recent topological contributions include sev- 
eral definitions for entropy, the development of a 
theory for knot holders or braid holders (also called 
branched manifolds), the Birman-Williams theorem 
for these objects, and relative rotation rates, a 
topological index for individual periodic orbits and 
orbit pairs. 

Three-dimensional strange attractors are 
remarkably well understood; those in higher 
dimensions are not. As a result, the description 
that follows is largely restricted to strange attrac- 
tors with di < 3 that exist in R? or other three- 
dimensional manifolds (e.g., R? x S'). The obstacle 
to progress in higher dimensions is the lack of a 
higher-dimensional analog of the Gauss linking 
number for orbit pairs in R?. 


Overview 
The program described below has two objectives: 


1. classify the global topological structure of strange 
attractors in R°; and 

2. determine the “perestroikas” (changes) that such 
attractors can undergo as experimental condi- 
tions or control parameters change. 


Four levels of structure are required to complete 
this program. Each is topological and discretely 
quantifiable. This provides a beautiful interaction 
between a rigidity of structure, demanded by 
topological constraints, and freedom within this 
rigidity. These four levels of structure are: 


1. basis sets of orbits, 

2. branched manifolds or knot holders, 
3. bounding tori, and 

4. embeddings of bounding tori. 


Branched Manifolds: Stretching 
and Squeezing 


A strange attractor is generated by the repetition of 
two mechanisms: stretching and squeezing. Stretch- 
ing occurs in the directions identified by the positive 


Lyapunov exponents and squeezing occurs in the 
directions identified by the negative Lyapunov 
exponents. In R? there is one stretching direction 
and one squeezing direction. 

A simple stretch-and-squeeze mechanism that 
nature appears to be very fond of is illustrated in 
Figure 1. In this illustration, a cube of initial 
conditions at (a) is advected by the flow in a short 
time to (b). During this process, the cube is 
deformed by being stretched (A, > 0). It also shrinks 
in a transverse direction (A5 < 0). During the initial 
phase of this deformation, two nearby points 
typically separate exponentially in time. If they 
were to continue to separate exponentially for all 
times, the invariant set would not be bounded. 
Therefore, this separation cannot continue indefi- 
nitely, and in fact it must somehow reverse itself 
after some time because the motion is recurrent. The 
mechanism shown in Figure 1 involves folding, 
which begins between (b) and (c) and continues 
through to (d). Squeezing occurs where points from 
distant parts of the attractor approach each other 
exponentially, as at (d). Finally, the cube, shown 
deformed at (d), returns to the neighborhood of 
initial conditions (a). This process repeats itself and 
builds up the strange attractor. As can be inferred 
from this figure, the strange attractor constructed by 
the repetitive process is smooth in the expanding 
(ài) and flow (A5 =0) directions but fractal in the 
squeezing (A3) direction. The attractor's fractal 
dimension is el 十 ez + €3 = 2 + € = 2 + 1/|A4]. 

Figure 1 summarizes the boundedness and recur- 
rence conditions that were introduced to define 
strange attractors, and illustrates one stretching and 
squeezing mechanism that occurs repetitively to 
build up the fractal structure of the strange attractor 


Boundary (c) 
layer 


Squeeze 
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Figure 1 A common stretch-and-fold mechanism generates 
many experimentally observed strange attractors. The Topology 
of Chaos; R Gilmore and M Lefranc; Copyright © 2002, Wiley. 
This material is used by permission of John Wiley & Sons, Inc. 


and to organize all the (unstable) periodic orbits in it 
in a unique way. The particular mechanism shown 
in Figure 1 is called a stretch-and-fold mechanism. 
Other mechanisms involve stretch and roll, and tear 
and squeeze. 

The stretch-and-squeeze mechanisms are well 
summarized by the cartoons shown in Figure 2. On 
the left, a cube of initial conditions (top) is deformed 
under the flow. The flow is downward. Stretching 
occurs in one direction (horizontal) and shrinking 
occurs in a transverse direction (perpendicular to the 
page). In the limit of extreme shrinking (45 一 
—*‘oo”), the dynamics of the stretching part of the 
flow is represented by the two-dimensional surface 
shown on the bottom left. This surface fails to be a 
manifold because of the singularity, called a splitting 
point. This singularity represents an initial condition 
that flows to an unstable fixed point with at least 
one stable direction. On the right (squeezing), two 
distant cubes of initial conditions (top) in the flow 
are deformed and brought to each other's proximity 
under the flow (middle). In the limit of extreme 
dissipation, two two-dimensional surfaces represent- 
ing inflows are joined at a branch line to a single 
surface representing an outflow. This surface fails to 
be a manifold because of the branch line, which is a 
singularity of a different kind. Points below the 
branch line in this representation of the flow (on the 
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Figure 2 Left: The stretch mechanism is modeled by a two- 
dimensional surface with a splitting point singularity. Right: The 
squeeze mechanism is modeled by a two-dimensional surface 
with a branch line singularity. The Topology of Chaos; R Gilmore 
and M Lefranc; Copyright @ 2002, Wiley. This material is used 
by permission of John Wiley & Sons, Inc. 
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outflow side of the branch line) have two preimages 
above the branch line, one in each inflow sheet. This 
structure generates positive entropy. 

A beautiful theorem of Birman and Williams 
justifies the use of the two cartoons shown at the 
bottom of Figure 2 to characterize strange attractors 
in R?. As preparation for the theorem, Birman and 
Williams introduced an important identification for 
the nongeneric or atypical points that “are not 
sensitive to initial conditions" 


x~y if |x(t)—y(t)|"—F0 [2] 


That is, two points in a strange attractor are 
identified if they have asymptotically the same 
future. In practice, this amounts to projecting the 
flow down along the stable (4; < 0) direction onto a 
two-dimensional surface described by the stretching 
(A; 0) and the flow (A»—0) directions. This 
surface is not a manifold because of lower- 
dimensional singularities: splitting points and branch 
lines. The two-dimensional surface has many names, 
for example, knot holder (because it holds the 
periodic orbits that exist in abundance in strange 
attractors), braid holders, templates, branched mani- 
folds. The flow, restricted to this surface, is called a 
semiflow. Under the semiflow, points in the branched 
manifold have a unique future but do not have a 
unique past. The degree of nonuniqueness is mea- 
sured by the topological entropy of the dynamical 
system. The Birman- Williams theorem is: 


Theorem Assume that a flow 9, 


(i) on R? is dissipative (A4 > 0, Ax —0, A3 < 0 and 
Ay + A2 + A5 < 0); 

(ii) generates a hyperbolic strange attractor (the 
eigenvectors of the local Lyapunov exponents 
ài, À2, A3 span everywhere on the attractor). 


Then the projection |2] maps the strange attractor 
SA to a branched manifold BM and the flow ®, on 
SA to a semiflow Ê; on BM in R?. The periodic 
orbits in SA under 4, correspond 1:1 with the 
periodic orbits in BM under à, with perhaps one or 
two specified exceptions. On any finite subset of 
orbits the correspondence can be taken via isotopy. 


The beauty of this theorem is that it guarantees 
that a flow €, that generates a (fractal) strange 
attractor S.A can be continuously deformed to a new 
flow ©, on a simple two-dimensional structure BM. 
During this deformation, periodic orbits are neither 
created nor destroyed. The uniqueness theorem for 
ODEs is satisfied during the deformation, so orbit 
segments do not pass through each other. As a 
result, the topological organization of all the 
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unstable periodic orbits in the strange attractor is 
the same as the topological organization of all the 
unstable periodic orbits in the branched manifold. In 
fact, the branched manifold (knot holder) defines 
the topological organization of all the unstable 
periodic orbits that it supports. Topological organi- 
zation is defined by the Gauss linking number and 
the relative rotation rates, another braid index. 

The significance of this theorem is that strange 
attractors can be characterized — in fact classified — 
by their branched manifolds. Figure 3 shows a 
branched manifold “for a figure-8 knot" as well as 
the figure-8 knot itself (dark curve). If a constant 
current is sent through a conducting wire tied into 
the shape of a figure-8 knot, a discrete countable set 
of magnetic field lines will be closed. These closed 
field lines can be deformed onto the two-dimen- 
sional surface shown in Figure 3. Each of the eight 
branches of this branched manifold can be named. 
One way to do this specifies the two branch lines 
that are joined by the branch in the sense of the flow 
(e.g., (ao) and (Ga) (but not (aß)). Every closed field 
line can be labeled by a symbol sequence that is 


0 0 
of |O 0 Q 0 0 0 1 1 
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Figure 3 Figure-8 knot (dark curve) and the figure-8 branched 
manifold. Transition matrix for the eight branches of the figure-8 
branched manifold is also shown. Flow direction is shown by 
arrows. The Topology of Chaos; R Gilmore and M Lefranc; 
Copyright © 2002, Wiley. This material is used by permission of 
John Wiley & Sons, Inc. 


unique up to cyclic permutation. This symbol 
sequence provides a symbolic name for the orbit. 
For example, (aa)(a3)(3b)(ba) is a period-4 orbit. 
The structure of a branched manifold is determined 
in part by a transition matrix T. The matrix element 
T; is 1 if the transition from branch i to branch j is 
allowed, 0 otherwise. The transition matrix for the 
figure-8 branched manifold is shown in Figure 3. 

The Birman-Williams theorem is stronger than its 
statement suggests. More systems satisfy the state- 
ment of the theorem than do the assumptions of the 
theorem. The figure-8 knot, and its attendant 
magnetic field, is not dissipative — in fact, it is not 
even a dynamical system, yet the closed loops can be 
isotoped to the figure-8 knot holder. There are other 
ways in which the Birman-Williams theorem is 
stronger than its statement suggests. 

It is apparent from Figure 3 that the figure-8 
branched manifold can be built up Lego^ fashion 
from the two basic building blocks shown in 
Figure 2. This is more generally true. Every 
branched manifold can be built up, Lego^ fashion, 
from the stretch (with a splitting point singularity) 
and the squeeze (with a branch line singularity) 
building blocks, subject to the following two 
conditions: 


1. outputs flow to inputs and 
2. there are no free ends. 


The figure-8 branched manifold is built up from 
four stretch and four squeeze building blocks. As a 
result, there are eight branches and four branch 
lines. 

Two often-studied strange attractors are shown in 
Figures 4 and 5. Figure 4 shows the details of the 
Rossler dynamical system. A similar spectrum of 
features is shown in Figure 5 for the Lorenz equations. 
The knot holder in Figure 5e is obtained from the 
caricature in Figure 5d by twisting the right-hand lobe 
by a radians. 

Branched manifolds can be used to characterize 
all three-dimensional strange attractors. Branched 
manifolds that classify the strange attractors gener- 
ated by four familiar sets of equations (for some 
control parameter values) are shown in Figure 6. 
The sets of equations, and one set of parameter 
values that generate strange attractors, are presented 
in Table 1. 

The beauty of this topological classification. of 
strange attractors is that it is apparent, just by 
inspection, that there is no smooth change of 
variables that will map any of these systems to any 
of the others for the parameter values shown. 

Branched manifolds can be described algebrai- 
cally. In Figure 7 we provide the algebraic 
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Figure 4 The Róssler dynamical system. (a) Róssler equations. (b) Time series z(t) and x(t) generated by these equations, and 
(c) projection of the strange attractor onto the x-y plane. (d) Caricature of the flow and (e) knot holder derived directly from the 
caricature. Control parameter values (a, b,c) — (2.0, 4.0, 0.398). The Topology of Chaos; R Gilmore and M Lefranc; Copyright @ 2002, 
Wiley. This material is used by permission of John Wiley & Sons, Inc. 
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(a) Lorenz equations. (b) Time series x(t) and z(t) generated by these equations, and (c) projection of the strange attractor 


onto the x-y plane. (d) Caricature of the flow and (e) knot holder derived directly from the caricature by rotating the right-hand lobe by z 
radians. Control parameter values (RH, c, b) — (26.0, 10.0, 8/3). The Topology of Chaos; R Gilmore and M Lefranc; Copyright © 2002, 
Wiley. This material is used by permission of John Wiley & Sons, Inc. 


description of two branched manifolds. Figure 7a 
shows the branched manifold that describes experi- 
mental data generated by many physical systems. 
The mechanism is a simple stretch-and-fold defor- 
mation with zero global torsion that generates a 
typical Smale horseshoe. There are two branches. 
The diagonal elements of the matrix identify the 
local torsion of the flow through the corresponding 
branch, measured in units of m. Branch 0 has no 
local torsion, and branch 1 shows a half-twist and 
has local torsion +1. The off-diagonal matrix 


elements are twice the linking number of the 
period-1 orbits in the corresponding pair of branches. 
Since the period-1 orbits in these two branches do not 
link, the off-diagonal matrix elements are 0. The 
period-1 orbits in the branches labeled 1 and 2 in 
Figure 7b have linking number +1, so the off-diagonal 
matrix elements are T(1,2) — T(2,1) 22 x +1. The 
array identifies the order (above, below) that the two 
branches are joined at the branch line, the smaller the 
value, the closer to the viewer. These two pieces of 
information, four integers in Figure 7a and eight in 
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(d) 

Figure 6 Branched manifolds for four standard sets of 
equations: (a) Róssler equations, (b) periodically driven Duffing 
equations, (c) periodically driven van der Pol equations, and 
(d) Lorenz equations. The Topology of Chaos; R Gilmore and 
M Lefranc; Copyright @ 2002, Wiley. This material is used by 
permission of John Wiley & Sons, Inc. 


Table 1 Four sets of equations that generate strange attractors 


Dynamical Parameter 

system ODEs values 
x=-y-Z 

Rossler y=x+ ay (a, b, c) = (2.0, 4.0, 0.398) 
zZ=b+2(x - c) 

| x=y 
Duffing ý= —6y— X9 -x — (& Au) — (0.4, 0.4, 1.0) 
+ Asin(wt) 

van der Pol x=by+(c—dy*)x (b,c,d,A,w)= 
y= —x + Asin(wt) (0.7, 1.0, 10.0, 0.25, 7/2) 
x= -—ox+oy 

Lorenz y-Hx-y-xz (Rc, b) = (26.0, 10.0, 8/3) 
z= —bz+ xy 


Figure 7b, serve to determine the topological organi- 
zation of all the unstable periodic orbits in any 
strange attractor with either branched manifold. 

The periodic orbits are identified by a repeating 
symbol sequence of least period p, which is unique 
up to cyclic permutation. The symbol sequence 
consists of a string of integers, sequentially identify- 
ing the branches through which the orbit passes. For 
a branched manifold with two branches, there are 
two symbols. The number of orbits of period 
p, N(p), obeys the recursion relation 


k<p/2 


pN(p) = 2? — 》 kN(k) [3] 
1—k|p 


DD O 
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Figure 7 Branched manifolds are described algebraically. The 
diagonal matrix elements describe the twist of each branch. 
The off-diagonal matrix elements are twice the linking number of 
the period-1 orbits in each of the two branches. The array 
describes the order in which the branches are connected at the 
branch line. (a) Smale horseshoe branched manifold. (b) Beginning 
of a "gateau roulé" (jelly roll) branched manifold. 
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Table 2 shows the number of orbits of period 
p € 20 for the branched manifolds with two and 
three branches shown in Figure 7. The number of 
orbits of period p grows exponentially with p, and 
the limit by = limp — ~ log (N(p))/p defines the topo- 
logical entropy br for the branched manifold. The 
limits are In 2 and In 3 for the branched manifolds 
with two and three branches, respectively. The 
linking numbers of orbits up to period 5 in the 
Smale horseshoe branched manifold are shown in 
Table 3, which identifies each of the orbits by its 
symbol sequence (e.g., 00111). 


Table 2 Number of orbits of period p on the branched manifolds 
with two and three branches, shown in Figure 7. The integers 
N3(p) are constructed by replacing 2? by 3? in eqn [3] 


Two Three Two Three 
Period branches branches Period branches branches 
p N2(p) N3(p) p No(p) Ns(p) 
1 2 3 11 186 16 104 
2 1 3 12 335 44 220 
3 2 8 13 630 122640 
4 3 18 14 1 161 341 484 
5 6 48 15 2182 956 576 
6 9 116 16 4080 2690010 
rj 18 312 17 7710 7 596480 
8 30 810 18 14532 21522228 
9 56 2184 19 27 954 61 171656 
10 99 5880 20 52 377 174 336 264 
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Table 3 Linking numbers of orbits to period 5 in the Smale horseshoe branched manifold with zero global torsion 


0 1 2 3, 3, 44 

0 0 0 0 0 0 0 

1 0 0 1 1 1 2 
2, 01 0 1 1 2 2 3 
3; 011 0 1 2 2 3 4 
3, 001 0 1 2 3 2 4 
4 0111 0 2 3 4 4 5 
4» 0011 0 1 2 3 3 4 
4o 0001 0 1 2 3 3 4 
5, 01111 0 2 4 5 5 8 
5, 01101 0 2 4 z 5 8 
5, 00111 0 2 3 5 4 7 
5; 00101 0 2 3 5 4 7 
54 00011 0 1 2 3 3 4 
54 00001 0 1 2 3 3 4 


Tables of linking numbers have been used 
successfully to identify mechanisms that nature uses 
to generate chaotic data. This analysis procedure is 
called topological analysis. Segments of data are 
identified that closely approximate unstable periodic 
orbits existing in the strange attractor. These data 
segments are then embedded in R?. Each orbit is 
given a trial identification (symbol sequence). Their 
pairwise linking numbers are computed either by 
counting signed crossings or using the time- 
parametrized data segments and estimating the 
integers numerically using the Gauss linking integral 


Link(A, B) 

| ra(tı) — ra(t2) 

E Ira(t) — ra (t2) 

This table of experimental integers is compared with 
the table of linking numbers for orbits with the same 
symbolic name on a trial branched manifold. This 
procedure serves to identify the branched manifold 
and refine the symbolic identifications of the 
experimental orbits, if necessary. The procedure is 
vastly overdetermined. For example, the linking 
numbers of only three low-period orbits serve to 
identify the four pieces of information required to 
specify a branched manifold with two branches. 
Since six or more surrogate periodic orbits can 
typically be extracted from experimental data, 
providing (5$) —15 or more linking numbers, this 
topological analysis procedure has built-in self- 


consistency checks, unlike analysis procedures 
based on geometric and dynamical tools. 


; dra(t1) x drg(t2) 


Basis Sets of Orbits 


A branched manifold determines the topological 
organization of all the periodic orbits that it 
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supports. Whenever a low-dimensional strange 
attractor is subjected to topological analysis, it is 
always the case that fewer periodic orbits are 
present and identified than are allowed by the 
branched manifold that classifies it. This is the case 
for strange attractors generated by experimental 
data as well as strange attractors generated by 
ODEs. The full spectrum occurs only in the 
hyperbolic limit, which has never been seen. 

The orbits that are present are organized exactly 
as in the hyperbolic limit — that is, as determined by 
the underlying branched manifold. As control para- 
meters change, the strange attractor undergoes 
perestroikas. New orbits are created and/or old 
orbits are annihilated in direct or inverse period- 
doubling and saddle-node bifurcations. The orbits 
that are present are always organized as determined 
by the branched manifold. Orbits are not created or 
annihilated independently of each other. Rather, 
there is a partial order (“forcing order") involved in 
orbit creation and annihilation. This partial order is 
poorly understood for general branched manifolds. 
It is much better understood for the two-branch 
Smale horseshoe branched manifold. 

The forcing diagram for this branched manifold 
is shown in Figure 8 for orbits up to period 8. It is 
typically the case that the existence of one orbit in 
a strange attractor forces the presence of a 
spectrum of additional orbits. Forcing is transitive, 
so if orbit A forces orbit B(A — B) and B forces C, 
then A forces C: if A — B and B — C then A > C. 
For this reason, it is sufficient to show only the 
first-order forcing in this figure. The orbits shown 
are labeled by their period and the order in which 
they are created in a particular highly dissipative 
limit of the dynamics: the logistic map (U-sequence 
order in Figure 8). For example, 52 describes the 
second (pair) of period-5 orbits created in the 
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Figure 8 (a) Forcing diagram for orbits up to period 8 in the Smale horseshoe branched manifold. (b) The sequence (“universal 
order") in which orbits are created in the highly dissipative limit, which is the logistic map. The Topology of Chaos; R Gilmore and 
M Lefranc; Copyright @ 2002, Wiley. This material is used by permission of John Wiley & Sons, Inc. 


logistic map in the transition. from simple, non- 
chaotic behavior to fully chaotic (hyperbolic) 
behavior. 

The orbits in the forcing diagram are organized 
according to their one-dimensional entropy 
(horizontal axis, U-sequence order) and their two- 
dimensional entropy (vertical axis). Nonchaotic 
("laminar") behavior occurs at the lower left of 
this figure, where both entropies are zero. Fully 
chaotic behavior occurs at the upper right, where 
both entropies are In2. As control parameters 
change, a dynamical system that can exhibit chaos 
generated by a stretch-and-fold mechanism follows a 
path in the forcing diagram from the lower left to 
the upper right. Each such path is a “route to 
chaos." The Smale horseshoe mechanism exhibits 
many different routes to chaos: each follows a 
different path in the forcing diagram. 

The state of a strange attractor at any stage in its 
route to chaos can be specified by a “basis set of 
orbits." This is a set of orbits whose presence forces 
the existence of all other orbits that can concur- 
rently be found in the attractor, up to any finite 


period. The basis set of orbits can be constructed 
algorithmically. The algorithm is as follows: 


1. Write down all the orbits that are present in 
order of increasing two-dimensional entropy 
from left to right. 

2. For orbits with the same two-dimensional entropy, 
order by increasing one-dimensional entropy. 

3. Remove the “highest” (rightmost) orbit from this 
list, together with all the orbits that it forces. 
This is the first basis orbit. 

4. Of the orbits remaining, again remove the right- 
most and all the orbits that it forces. This is the 
second basis orbit. 

5. Continue until all orbits have been removed. 


For any finite period, the above algorithm 
terminates because there is only a finite number of 
orbits. For example, if the orbit 55 is present as well 
as all orbits with lower one-dimensional entropy, 
the basis set is 87; R, 76, 74F, 86F,88,55. As control 
parameters change, a strange attractor undergoes 
perestroikas that are quantitatively determined by 
changes in the basis sets of orbits. 


Bounding Tori 


As experimental conditions or control parameters 
change, strange attractors can undergo “grosser” 
perestroikas than those that can be described by a 
change in the basis set of orbits. This occurs when new 
orbits are created that cannot be contained on the initial 
branched manifold — for example, when orbits are 
created that must be described by a new symbol. This is 
seen experimentally in the transition from horseshoe 
type dynamics to gateau roulé type dynamics. This 
involves the addition of a third branch to the branched 
manifold with two branches, as shown in Figures 7a 
and 7b. Strange attractors can undergo perestroikas 
described by the addition of new branches to, or 
deletion of old branches from, a branched manifold. 
These perestroikas are in a very real sense “grosser” 
than the perestroikas that can be described by changes 
in the basis sets of orbits on a fixed branched manifold. 

There is a structure that provides constraints on 
the allowed bifurcations of branched manifolds 
(creation/annihilation of branches), which is analo- 
gous to the constraints that a branched manifold 
provides on the bifurcations and topological organi- 
zation of the periodic orbits that can exist on it. This 
structure is called a bounding torus. 

Bounding tori are constructed as follows. The semi- 
flow on a branched manifold is *inflated" or *blown 
up" to a flow on a thin open set in R? containing this 
branched manifold. The boundary of this open set is a 
two-dimensional surface. Such surfaces have been 
classified. They are uniquely tori of genus g; g=0 
(sphere), g — 1 (tire tube), g=2,3,.... The torus of 
genus g has Euler characteristic y = 2 — 2g. The flow is 
into this surface. The flow, restricted to the surface, 
exhibits a singularity wherever it is normal to the 
surface. At such singularities the stability is determined 
by the local Lyapunov exponents: Al > 0 and A; < 0, 
since the flow direction (A; — 0) is normal to the 
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surface. As a result, all singularities are saddles; so, by 
the Poincaré-Hopf theorem, the number of singularities 
is strongly related to the genus. The number is 2(g — 1). 

The flow, restricted to the genus-g surface, can be 
put into canonical form and these canonical forms can 
be classified. The classification involves projection of 
the genus-g torus onto a two-dimensional surface. The 
planar projection consists of a disk with outer 
boundary and g interior holes. All singularities can be 
placed on the interior holes. The flow on the interior 
holes without singularities is in the same direction as 
the flow on the exterior boundary. Interior holes with 
singularities have an even number, 4,6,.... Some 
canonical forms are shown in Figure 9. 

Poincaré sections have been used to simplify the 
study of flows in low-dimensional spaces by effec- 
tively reducing the dimension of the dynamics. In 
three dimensions, a Poincaré surface of section for a 
strange attractor is a minimal two-dimensional sur- 
face with the property that all points in the attractor 
intersect this surface transversally an infinite number 
of times under the flow. The Poincaré surface need 
not be connected and in fact is often not connected. 

The Poincaré section for the flow in a genus-g torus 
consists of the union of g — 1 disjoint disks (g > 3) or 
is a single disk (g — 1). The locations of the disks are 
determined algorithmically, as shown in Figure 9. The 
interior circles without singularities are labeled by 
capital letters A, B, C,... and those with singularities 
are labeled with lowercase letters a,b,c,... The 
components of the global Poincaré surface of section 
are numbered sequentially 1,2,...,g — 1, in the order 
they are encountered when traversing the outer 
boundary in the direction of the flow, starting from 
any point on that boundary. Each component of the 
global Poincaré surface of section connects (in the 
projection) an interior circle without singularities to 
the exterior boundary. There is one component 
between each successive encounter of the flow with 
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Figure 9 Three inequivalent canonical forms of genus 8 are shown. Each is identified by a “period-7 orbit" and its dual. Reprinted 
figure with permission from Physical Review E, 69, 056206, 2004. Copyright (2004) by the American Physical Society. 


486 Chaos and Attractors 


holes that have singularities. Heavy lines are used to 
show the location of the seven components of the 
global Poincaré surface of section for each of the three 
inequivalent genus-8 canonical forms shown in 
Figure 9. The structure of the flow is summarized by 
a transition matrix. For the canonical form shown in 
Figure 9c the transition matrix is 


1 d 0 9 O0 0 U 
0 0110 0 9 
001100 0 
T-—I0 0 0 80 1 1 0 
00001 1 0 
010000 1 
1000 00 1 


where T; ;—1 if the flow can proceed directly from 
component i to component j, 0 otherwise. 
Bounding tori, dressed with flows, can be labeled. In 
fact, two dual labeling schemes are possible. Following 
the outer boundary in the direction of the flow, one 
encounters the g — 1 components of the global Poin- 
caré surface of section sequentially, the interior holes 
without singularities at least once each, and the interior 
holes with singularites at least twice each. The 
canonical form (genus-g torus dressed with a flow) on 
the genus-8 bounding torus shown in Figure 9a can be 
labeled by the sequence in which the holes without 
singularities are encountered (ABCBDED) or the order 
in which the holes with singularities are encountered 
(abbacca). Both sequences contain g — 1 symbols. 
These labels are unique up to cyclic permutation. 
Symbol sequences for canonical forms for bounding 
tori act in many ways like symbol sequences for 
periodic orbits on branched manifolds. Although there 
is a 1:1 correspondence between bounded closed two- 
dimensional surfaces in R? and genus g, the number of 


Table 4 Number of canonical bounding tori as a function of 


genus g 

g N(g) g N(g) g N(g) 

3 1 9 15 15 2211 
4 1 10 28 16 5549 
5 2 11 67 17 14 290 
6 2 12 145 18 36 824 
7 5 13 368 19 96 347 
8 6 14 870 20 252 927 


canonical forms grows rapidly with g, as shown in 
Table 4. In fact, the number, N(g), grows exponen- 
tially and can even be assigned an entropy: 


lim Ne) lag 


Jim T ] 


In some sense, canonical forms that constrain 
branched manifolds within them behave like branched 
manifolds that constrain periodic orbits on them. 

Every strange attractor that has been studied in R? 
has been described by a canonical bounding torus that 
contains it. This classification is shown in Table 5. 

Branched manifold perestroikas are constrained 
by bounding tori as follows. Each branch line of any 
branched manifold can be moved into one of the 
g — 1 components of the global Poincaré surface of 
section. Any branched manifold contained in a 
genus-g bounding torus (g > 3) must have at least 
one branch between each pair of components of the 
global Poincaré surface of section between which the 
flow is allowed, as summarized by the canonical 
form's transition matrix. New branches can only be 
added in a way that is consistent with the canonical 
form's transition matrix, continuity requirements, 
and the no intersection condition. 


Table 5 All known strange attractors of dimension d, < 3 are bounded by one of the standard dressed tori. Dual labels for the 
bounding tori depend on g — 1 symbols describing holes with or without singularities 


Strange attractor 


Rossler, Duffing, Burke, and Shaw A 

Various lasers, gateau roulé A 

Neuron with subthreshold oscillations A 
Shaw-van der Pol A. 
Lorenz, Shimizu-Morioka, Rikitake AB 

Co covers of Rossler AB 

Co cover of Lorenz? ABCD 

Co cover of Lorenz” ABCB 

2 — 1 Image of figure-8 branched manifold ABCB 
Figure-8 branched manifold AEBECEDE 
Cn covers of Rossler AB-..N 
Cn cover of Lorenz? AB -- - (2N) 


C, cover of Lorenz? 
Multispiral attractors 


“Rotation axis through origin. 
PRotation axis through one focus. 


Holes w/o singularites 


(AZ)(BZ) - -- (NZ) 
A( - -- M)N(B--- M) 


Holes with singularities Genus 
1 
1 
1 
1 
aa 3 
a? 3 
a* 5 
abba 5 
ab(ab) ' 5 
a? b? c? d* 9 
a” n+1 
gen 2n+1 
a? b? ---n? 2n 4 1 
(ab... m)(ab--.m)" 2m 4 1 


In the simplest case, g — 1, a third branch can be 
added to a branched manifold with two branches only 
if its local torsion differs by +1 from the adjacent 
branch. In addition, the ordering of the new branch 
must be consistent with the continuity and no 
intersection (ODE uniqueness theorem) requirements. 


Embeddings of Bounding Tori 


The last level of topological structure needed for the 
classification of strange attractors in R? describes 
their embeddings in R?. The classification using 
genus-g bounding tori is intrinsic — that is, the 
canonical form shows how the flow looks from 
inside the torus. Strange attractors, and the tori that 
bound them, are actually embedded in R?. For a 
complete classification, we must specify not only the 
canonical form but also how this form sits in R?. 
This program has not yet been completed, but we 
illustrate it with the genus-1 bounding torus in 
Figure 10. Figure 10a shows the canonical form, and 
two different embeddings of it in R?. The embedding 
on the left is unknotted. The embedding on the right is 
knotted like a figure-8 knot. Extrinsic embeddings of 
genus-1 tori are described by tame knots in R?, and 
tame knots can be used as “centerlines” for extrinsi- 
cally embedded genus-1 tori. Higher-genus (g > 3) 
canonical forms - intrinsic genus-g tori dressed with a 


(a) 


(b) (c) 

Figure 10 (a) Canonical form for genus-1 bounding torus. 
Extrinsic embeddings of the torus into R? that are (b) unknotted 
and (c) knotted like the figure-8 knot. 
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canonical flow — have a larger (but discrete) variety of 
extrinsic embeddings in R?. 


The Embedding Question 


The mechanism that nature uses to generate chaotic 
behavior in physical systems is not directly observable, 
and must be deduced by examining the data that are 
generated. Typically, the data consist of a single scalar 
time series that is discretely recorded: x;, i= 1,2,.... 
In order to exhibit a strange attractor, a mapping of the 
data into RN must also be constructed. If the attractor 
is low dimensional (dr < 3), one can hope that a 
mapping into R? can be constructed that exhibits no 
self-intersections or other degeneracies. Such a map is 
called an embedding. Once an embedding in R? is 
available, a topological analysis can be carried out. The 
analysis reveals the mechanism that underlies the 
creation of the embedded strange attractor. 

But how do you know that the mechanism that 
generates the observed, embedded strange attractor 
has anything to do with the mechanism nature used 
to generate the experimental data? 

If the embedding is contained in a genus-1 bounding 
torus, then the topological mechanism that generates 
the data, as defined by some unknown branched 
manifold B.M gxp, and the topological mechanism that 
is identified from the embedded strange attractor 
BMemp, are identical up to three degrees of freedom: 
parity, global torsion, and the knot type. As a result, in 
this case (genus-1) a topological analysis of embedded 
data does reveal nature's hidden secrets. 


See also: Ergodic theory; Fractal dimensions in 
dynamics; Generic Properties of Dynamical Systems; 
Gravitational N-body Problem (Classical); 
Homeomorphisms and Diffeomorphisms of the Circle; 
Homoclinic phenomena; Inviscid Flows; Lyapunov 
Exponents and Strange Attractors; Nonequilibrium 
Statistical Mechanics (Stationary): Overview; Random 
Algebraic Geometry, Attractors and Flux Vacua; Random 
Matrix Theory in Physics; Regularization for Dynamical 
Zeta Functions; Singularity and Bifurcation Theory; 
Symmetry and Symmetry Breaking in Dynamical 
Systems; Synchronization of Chaos. 
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Vector Bundles 


Let Vect,(M, F) be the set of isomorphism classes of 
real (F— R) or complex (F-— C) vector bundles of 
rank k over a smooth connected m-dimensional 
manifold M. Let | 


Vect(M, F) = | J Vect; (M, F) 
k 


Principal Bundles - Examples 
Let H be a Lie group. A fiber bundle 
p:P—M 


with fiber H is said to be a principal bundle if there 
is a right action of H on P which acts transitively on 
the fibers, that is, if P/H=M. If H is a closed 
subgroup of a Lie group G, then the natural 
projection G — G/H is a principal H bundle over 
the homogeneous space G/H. Let O(k) and U(k) 
denote the orthogonal and unitary groups, respec- 
tively. Let S* denote the unit sphere in R**!. Then 
we have natural principal bundles: 


O(k) C O(k 4- 1) 5 S* 
U(k) CU(k 4-1) — S* 


Let RP* and CP^ denote the real and complex 
projective spaces of lines through the origin in R**! 
and C**!, respectively. Let 


Z2 = (Xd) c O(k) 
S = [A«1d : 1A] = 1) c DE) 
One has Z; and S! principal bundles: 
Z, => St, gp 
G1 _, s2k-1 _, Cpk-1 


Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge 
University Press. 

Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear 
Physics and Its Mathematical Tools. Bristol: IoP Publishing. 

Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental 
Approach to Nonlinear Dynamics and Chaos. Reading, MA: 
Addison-Wesley. 


Frames 


A frame s:=(s1,...,5,) for V € Vect,(M, F) over an 
open set O C M is a collection of k smooth sections 
to V|, so that {s;(P),...,s,(P)} is a basis for the 
fiber Vp of V over any point P € O. Given such a 
frame s, we can construct a local trivialization which 
identifies O x F? with V|» by the mapping 


OPS Aggy AR) — A1s1(P) T: 十 Ags, (P) 


Conversely, given a local trivialization of V, we can 
take the coordinate frame 


s;(P) = P x (0,...,0,1,0,...,0) 


Thus, frames and local trivializations of V are 
equivalent notions. 


Simple Covers 


An open cover {Oa} of M, where o ranges over some 
indexing set A, is said to be a simple cover if any 
finite intersection Og, N +- N Oa, is either empty or 
contractible. 

Simple covers always exist. Put a Riemannian 
metric on M. If M is compact, then there exists a 
uniform ó > 0 so that any geodesic ball of radius 6 is 
geodesically convex. The intersection of geodesically 
convex sets is either geodesically convex (and hence 
contractible) or empty. Thus, covering M by a finite 
number of balls of radius ó yields a simple cover. 
The argument is similar even if M is not compact 
where an infinite number of geodesic balls is used 
and the radii are allowed to shrink near oc. 


Transition Cocycles 


Let Hom(F, k) be the set of linear transformations of 
F* and let GL(F, k) C Hom(F, k) be the group of all 
invertible linear transformations. 

Let {Sa} be frames for a vector bundle V over some 
open cover {Oa} of M. On the intersection Oa N Og, 
one may express Sa = 1/,5$3, that is 


SAP) - 5 Pag, (P)sa;(P) 


1<j<k 


The maps Yag: Oa N Os — GL(F, k) satisfy 


Waa =Id on o. 


1 
Wag = Vay Wp on On (1 Og f xd | | 
Let G be a Lie group. Maps belonging to a 
collection {pag} of smooth maps from Oa N Og to G 
which satisfy eqn [1] are said to be transition 
cocycles with values in G; if G C GL(F, k), they 
can be used to define a vector bundle by making 
appropriate identifications. 


Reducing the Structure Group 


If G is a subgroup of GL(F, k), then V is said to have 
a G-structure if we can choose frames so the 
transition cocycles belong to G; that is, we can 
reduce the structure group to G. 

Denote the subgroup of orientation-preserving 
linear maps by 


GL*(R,k) := (v € GL(R, k): det(w) > 0] 


If V € Vect, (M, R), then V is said to be orientable if 
we can choose the frames so that 


Wap € GL* (R, k) 


Not every real vector bundle is orientable; the first 
Stiefel-Whitney class sw (V) € H!(M; Z2), which is 
defined later, vanishes if and only if V is orientable. 
In particular, the Möbius line bundle over the circle 
is not orientable. 

Similarly, a real (resp. complex) bundle V is 
said to be Riemannian (resp. Hermitian) if we can 
reduce the structure group to the orthogonal group 
O(k) C GL(R,k) (resp. to the unitary group 
U(k) C GL(C, k)). 

We can use a partition of unity to put a positive- 
definite symmetric (resp. Hermitian symmetric) fiber 
metric on V. Applying the Gram-Schmidt process 
then constructs orthonormal frames and shows that 
the structure group can always be reduced to O(k) 
(resp. to U(k)); if V is a real vector bundle, then the 
structure group can be reduced to the special 
orthogonal group SO(k) if and only if V is 
orientable. 


Lifting the Structure Group 


Let 7 be a representation of a Lie group H to 
GL(F, k). One says that the structure group of V can 
be lifted to H if there exist frames {sa} for V and 
smooth maps bag :Oa NOg ^ H, so Toas = Pag 
where eqn [1] holds for ¢. 
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Spin Structures 


For k > 3, the fundamental group of SO(k) is Z2. 
Let Spin(k) be the universal cover of SO(k) and let 


T : Spin(k) — SO(k) 


be the associated double cover; set Spin(2) =S! and 
let r(A) = M. An oriented bundle V is said to be spin 
if the transition functions can be lifted from SO(k) 
to Spin(k); this is possible if and only if the second 
Stiefel-Whitney class of V, which is defined later, 
vanishes. There can be inequivalent spin structures, 
which are parametrized by the cohomology group 
H! (M; Z5). 


The Tangent Bundle of Projective Space 


The tangent bundle TRP” of real projective space is 
orientable if and only if m is odd; TRP” is spin if 
and only if m = 3 mod 4. If m = 3 mod 4, there are 
two inequivalent spin structures on this bundle as 
H'(RP"; Z2) = Z2. 

The tangent bundle TCP” of complex projective 
space is always orientable; TCP” is spin if and only 


if m is odd. 


Principal and Associated Bundles 


Let H be a Lie group and let 
Dal F Og —+ H 


be a collection of smooth functions satisfying the 
compatibility conditions given in eqn [1]. We define 
a principal bundle P by gluing Oa x H to Og x H 
using 9: 


(P, b), ~ (P, bap(P)h), for PE Oa N Og 


Because right multiplication and left multiplication 
commute, right multiplication gives a natural action 
of H on P: 


(P, b), - b :— (P, b - b), 


The natural projection P — P/H =M is an H fiber 
bundle. 


Let 7 be a representation of H to GL(F,k). For 
€ € P,A € F*, and b € H, define a gluing 


(E, A) ~ (£« b^! ,7(b)2) 
The associated vector bundle is then given by 
P x, F = P x Pfa 


Clearly, {Tag} are the transition cocycles of the 
vector bundle P x, F*. 
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Frame Bundles 


If V is a vector bundle, the associated principal 
GL(F,&) bundle is the bundle of all frames; if V is 
given an inner product on each fiber, then the 
associated principal O(k) or U(k) bundle is the bundle 
of orthonormal frames. If V is an oriented Riemannian 
vector bundle, the associated principal SO(k) bundle is 
the bundle of oriented orthonormal frames. 


Direct Sum and Tensor Product 


Fiber-wise direct sum (resp. tensor product) defines the 
direct sum (resp. tensor product) of vector bundles: 
qQ : Vect, (M, F) x Vect,( M, F) 
=f Vect, (M, F) 
& : Vect,(M,F) x Vect, ( M, F) 
— Vect, (M, F) 
The transition cocycles of the direct sum (resp. 
tensor product) of two vector bundles are the direct 
sum (resp. tensor product) of the transition cocycles 
of the respective bundles. 
The set of line bundles Vect;(M,F) is a group 
under &. The unit in the group is the trivial line 


bundle | := M x F; the inverse of a line bundle L is 
the dual line bundle L* :=Hom(L, F) since 


LƏL =| 


Pullback Bundle 


Let p:V — M be the projection associated with 
V € Vect, (M, F). If f is a smooth map from N to M, 
then the pullback bundle f*V is the vector bundle 
over N which is defined by setting 

fV = {(P,v) € N x V:f(P) = p(v)] 


The fiber of f*V over P is the fiber of V over f(P). 
Let {sa} be local frames for V over an open cover 
(O4) of M. For P € f! (O,), define 


{fsa} (P) := (P, ss (f CP) 


This gives a collection of frames for f*V over the 
open cover (f ^! (O,)) of N. Let i 


f* vag pe Wap of 
be the pullback of the transition functions. Then 
(P s. (P) = (P, Vaa(f(P))sa(F(P))) 
= ((F Vas) (fsa) } (P) 


This shows that the pullback of the transition 
functions for V are the transition functions of the 


pullback f*(V). 


Homotopy 


Two smooth maps fo and fi from N to M are 
said to be homotopic if there exists a smooth map 
F:N x1 M so that fo(P) - F(P,0) and so that 
fi(P) — F(P, 1). If fo and fi are homotopic maps from 
N to M, then fr V is isomorphic to f; V. 

Let [N, M] be the set of all homotopy classes 
of smooth maps from N to M. The association 
V — f*V induces a natural map 


[N, M] x Vect, (M, F) — Vect, (N, F) 


If M is contractible, then the identity map is 
homotopic to the constant map c. Consequently, 
V — Id" V is isomorphic to c* V =M x F*. Thus, any 
vector bundle over a contractible manifold is trivial. 
In particular, if {O,} is a simple cover of M and if 
V € Vect(M, F), then V|o. is trivial for each o. This 
shows that a simple cover is a trivializing cover for 
every V € Vect(M, F). 


Stabilization 


Let l € Vecti(M, F) denote the isomorphism class of 
the trivial line bundle M x F over an m-dimensional 
manifold M. The map V —^ V @1 induces a stabili- 
zation map 


s : Vect, (M, F) — Vect, ,1( M, F) 
which induces an isomorphism 


Vect, ( M, R) = Vect, 1( M, R) 
Vect, (M, C) = Vect, ,4(M, C) 


for k >m 
for 2k >m 


[2] 
These values of k comprise the stable range. 


The K-Theory 


The direct sum ® and tensor product & make 
Vect( M, F) into a semiring; we denote the associated 
ring defined by the Grothendieck construction by 
KF(M). If V € Vect(M, F), let [V] € KF(M) be the 
corresponding element of K-theory; KF(M) is gener- 
ated by formal differences [Vi] — [V2]; such formal 
differences are called virtual bundles. 

The Grothendieck construction (see K-theory) 
introduces nontrivial relations. Let $” denote the 
standard sphere in R"*!, Since 


T(S”) @1= (m+ 1) 


we can easily see that [TS"] —;»[1] in KR(S"), 
despite the fact that T(S") is not isomorphic to ml 
for 720% 1,3,7: 

Let L denote the nontrivial real line bundle over 
RP*. Then TRP* @1=(k + 1)L, so 


[TRP^] = (k + 1)[L] — [1] 


The map V — Rank(V) extends to a surjective 
map from KF(M) to Z. We denote the associated 
ideal of virtual bundles of virtual rank 0 by 


KF(M) := ker(Rank) 
In the stable range, V — [V] — k[ 1] identifies 
Vect, (M, R) = KR(M) if k» m 
Vect, (M,C) 2 KC(M) if 2k » m 


[3] 


These groups contain nontrivial torsion. Let L be the 
nontrivial real line bundle over RP*. Then 


KR(RP*) = Z- {[L] — [1/249 Z ([L] — [1]) 


where v(k) is the Adams number. 


Classifying Spaces 


Let Gr;(F, n) be the Grassmannian of k-dimensional 
subspaces of F”. By mapping a k-plane 7 in F” to the 
corresponding orthogonal projection on 7, we can 
identify Gr,(F, 7) with the set of orthogonal projec- 
tions of rank k: 


(£ € Hom(F"): & = €, € =£, tr(£) =k} 


There is a natural associated tautological k-plane 


bundle 
V,(F, n) € Vect, (Gr, CF, n), E) 
whose fiber over a k-plane 7 is the k-plane itself: 
V,(F, n) := ((£, x) € Hom(F") x F” : £x = x} 
Let [M, Gr,(F,z)] denote the set of homotopy 
equivalence classes of smooth maps f from M to 
Gr,(F,z). Since [fi]=[f] implies that frV is 
isomorphic to f; V, the association 
f —^f*'V,(F,n) € Vect, (M, F) 
induces a map 
[M, Gr, (F, 1)] — Vect, (M; F) 


This map defines a natural equivalence of functors 
in the stable range: 


[M, Gr, (R, v + &)] = Vect; (M, R) 
IM, Gr, (C, v + k)] = Vect, (M, C) 


for v > m 


[4] 


for 2v > m 


p 1 


The natural inclusion of F" in induces natural 


inclusions 


Gr, (F, n) C Gr,(F, n 4- 1) 


5 
V,(F, 5) C V,CF, n + 1) | | 


Let Gr,(F,oc) and V4(F,oo) be the direct limit 
spaces under these inclusions; these are the infinite- 
dimensional Grassmannians and classifying bundles, 
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respectively. The topology on these spaces is the 
weak or inductive topology. The Grassmannians are 
called classifying spaces. The isomorphisms of 
eqn [4] are compatible with the inclusions of eqn [5] 
and we have 


[M, Gr,(F, oc)] = Vect, (M, F) [6] 


Spaces with Finite Covering Dimension 


A metric space X is said to have a covering 
dimension at most m if, given any open cover {Ua} 
of X, there exists a refinement {Og} of the cover so 
that any intersection of more than m + 1 of the (O;] 
is empty. For example, any manifold of dimension 
m has covering dimension at most m. More 
generally, any m-dimensional cell complex has 
covering dimension at most 77. 

The isomorphisms of [2]-[4], and [6] continue to 
hold under the weaker assumption that M is a metric 
space with covering dimension at most m. 


Characteristic Classes of Vector 
Bundles 


The Cohomology of Gr, (F, 00) 


The cohomology algebras of the Grassmannians are 
polynomial algebras on suitably chosen generators: 


H* (Gr, (R, co); Z2) = Z2[sw1, ..., sw] 7 
IFP (Gr,(C, o0); Z) = Zle,...,c&] 


The Stiefel-Whitney Classes 


Let V € Vect,(M, R). We use eqn [6] to find 
V:M— Gr,(R,oo) which classifies V; the map Y 
is uniquely determined up to homotopy and, using 
eqn [7], one sets 


sw;(V) := U*sw; € H'(M; Z2) 
The total Stiefel-Whitney class is then defined by 
sw(V) = 1 +swı(V) +---+sw,(V) 
The Stiefel-Whitney class has the properties: 


1. If f:Xı — X2, then f*(sw(V)) 2 sw(f* V). 

2. sw(V à W)-—sw(V)sw(W ). 

3. If L is the Möbius bundle over S', then swı(L) 
generates H!(S!; Z5) = Z2. 


The cohomology algebra of real projective space 
is a truncated polynomial algebra: 


H*(RP*; Z2) = Z;|x]/x**! — 0 
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Since TRP^ @1=(k + 1)L, one has 


sw(TRP*) = (1 + x)**! 


(R+1)k > 


=1+kx+ XxX +--+ [8] 


Orientability and Spin Structures 


The Stiefel-Whitney classes have real geometric 
meaning. For example, sw;(V)=0 if and only if V 
is orientable; if sw;(V)=0, then sw2(V)=0 if and 
only if V admits a spin structure. With reference to 
the discussion on the tangent bundle or projective 
space, eqn [8] yields 


en(TRP) = io MIT mod 2 


Thus, RP^ is orientable if and only if k is odd. 
Furthermore, 


ky _ JO ifk=3mod4 
ao » if k= 1 mod4 
Thus, TRP* is spin if and only if k = 3 mod4. 


Chern Classes 


Let V € Vect, (M, C). We use eqn [6] to find 
V:M-— Gr,(C,oo) which classifies V; the map Y 
is uniquely determined up to homotopy and, using 
eqn [7], one sets 


ci(V) := V*c; € H"(M;Z) 
The total Chern class is then defined by 
c(V) = 1+¢(V)+---+¢(V) 
The Chern class has the properties: 


1. If f: X4 > Xa, then f*(c(V)) =clf*V). 

2. (V 6 W) - e(V)e( W). 

3. Let L be the classifying line bundle over 
S? = CP!. Then /uci(L)= —1. 


The cohomology algebra of complex projective 
space also is a truncated polynomial algebra 


H* (CP*; Z) = Z|x|/x**! 


where x = c1(L) and L is the complex classifying line 
bundle over CP^ = Gri(C, & +1). If T,CP* is the 
complex tangent bundle, then 


c(T,CP*) = (1 + x)**! 


The Pontrjagin Classes 


Let V be a real vector bundle over a topological 
space X of rank r=2k or r — 2k + 1. The Pontrjagin 


classes p;(V) € H"(X;Z) are characterized by the 
properties: 


1. p(V)=1+pi(V) +---+ pi (V). 

2. If f: X, 一 X», then f*(p(V)) - p(f* V). 

3. p(V & W) - p(V)p(W) mod elements of order 2. 
4. Jop P1(TCP*) =3. 


We can complexify a real vector bundle V to 
construct an associated complex vector bundle Vc. 


We have 


pi(V) := (-1)'eai(Vc) 


Conversely, if V is a complex vector bundle, we can 
construct an underlying real vector bundle Vr by 
forgetting the underlying complex structure. Mod- 
ulo elements of order 2, we have 


p(Vr) = c(V)c(V") 


Let TCP* be the real tangent bundle of complex 
projective space. Then 


p(TCP*) = (1 — x$)" 


Line Bundles 


Tensor product makes Vect;(M, F) into an abelian 
group. One has natural equivalences of functors 
which are group homomorphisms: 


sw; : Vect (M, R) 一 H! (M; Z2) 
cı : Vect (M, C) 一 H? (M; Z) 


A real line bundle L is trivial if and only if it is 
orientable or, equivalently, if sw;(L) vanishes. A 
complex line bundle L is trivial if and only if 
c1(L) —0. There are nontrivial vector bundles with 
vanishing Stiefel- Whitney classes of rank k > 1. For 
example, sw;(TS*) —0 for i > 0 despite the fact that 
TS* is trivial if and only if k= 1,3, 7. 


Curvature and Characteristic Classes 
de Rham Cohomology 


We can replace the coefficient group Z by C at the cost 
of losing information concerning torsion. Thus, we 
may regard p;(V) € H*(M;C) if V is real or ci(V) € 
H?(M;C) if V is complex. Let M be a smooth 
manifold. Let C*A?M be the space of smooth 
p-forms and let 


d: CALM — C9*APM 


be the exterior derivative. The de Rham cohomology 
groups are then defined by 


H^. (M) := ker(d : C9» APM — C? AP*1M) 
deR im(d : Ce AP-1M 一 CAPM) 

The de Rham theorem identifies the topological 
cohomology groups H?(M;C) with the de Rham 
cohomology groups Hi (M) which are given 
differential geometrically. 

Given a connection on V, the Chern-Weyl theory 


enables us to compute Pontrjagin and Chern classes in 
de Rham cohomology in terms of curvature. 


Connections 
Let V be a vector bundle over M. A connection 
V: C*(V) 2 C*(T*' M & V) 


on V is a first-order partial differential operator 
which satisfies the Leibnitz rule, that is, if s is a 
smooth section to V and if f is a smooth function 
on M, 


V(fs) = df @s+fVs 
If X is a tangent vector field, we define 
Vxs = (X, Vs) 


where (-,-) denotes the natural pairing between the 
tangent and cotangent spaces. This generalizes to the 
bundle setting the notion of a directional derivative 
and has the properties: 


L V xs =f Vxs. 

2. Vx(fs) = X(f)s + f Vxs. 

ae Vips = Vx,s F Vx,S. 

4. Vx(s1 + $2) =V xsi + Vxs2. 


The Curvature 2-Form 
Let wp be a smooth p-form. Then 
V : C*(A? M @ V) > C*(AP1 M @ V) 
can be extended by defining 
V (wp & s) = dwp &s-- (—1) wp ^ Vs 


In contrast to ordinary exterior differentiation, V? 
need not vanish. We set 


Q(s) := V?s 


This is not a second-order partial differential 
operator; it is a zeroth-order operator, that is, 


O(fs) = ddf @s — df ^ Vs + df ^ Vs + f V?s 
= fR(s) 
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The curvature operator 2 can also be computed 
locally. Let (s;) be a local frame. Expand 


Xs = Sou! & s; 
j 
to define the connection 1-form w. One then has 
V sies (dw -A aL) ® Sp 
and so 
f ) .k j 
If $ — ^j is another local frame, we compute 
iJ P 


Q-—dgg!'--gwg! and Q-gQ0g 


Although the connection 1-form w is not tensorial, the 
curvature is an invariantly defined 2-form-valued 
endomorphism of V. 


Unitary Connections 


Let (-,-) be a nondegenerate Hermitian inner product 
on V. We say that V is a unitary connection if 


(Vsi, s2) ej (S1, Vs2) = d(s1, $2) 


Such connections always exist and, relative to a 
local orthonormal frame, the curvature is skew- 
symmetric, that is, 


QO+0*% =0 


Thus, €) can be regarded as a 2-form-valued element 
of the Lie algebra of the structure group, O(V) in the 
real setting or U(V) in the complex setting. 


Projections 


We can always embed V in a trivial bundle 1" of 
dimension 7; let zy be the orthogonal projection on 
V. We project the flat connection to V to define a 
natural connection on V. For example, if M is 
embedded isometrically in the Euclidean space R”, 
this construction gives the Levi-Civita connection on 
the tangent bundle TM. The curvature of this 
connection is then given by 


i= TV dry dry 


Let Vp be the fiber of V over a point P € M. The 
inclusion ;: V C R" defines the classifying map 
f : P — Gr,(R,n) where we set 


f (P) = i(Vp) 
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Chern-Weyl Theory 


Let V be a Riemannian connection on a real vector 
bundle V of rank k. We set 


pt) = det(1 十 55%) 


Let Q! denote the transpose matrix of differential 
form. Since 2+92'=0, the polynomials of odd 
degree in Q vanish and we may expand 


p(Q) = 1+ pr(Q) +--+ pr(Q) 


where k — 2r or R=2r + 1 and the differential forms 
pi(Q) € C* A" (M) are forms of degree 4i. 

Changing the gauge (i.e., the local frame) replaces 
Q by gOg^! and hence p(Q) is independent of the 
local frame chosen. One can show that dp;(Q) — 0; 
let [p;(Q)] denote the corresponding element of de 
Rham cohomology. This is independent of the 
particular connection chosen and [p;(Q)] represents 
pi(V) in H*(M;C). 

Similarly, let V be a complex vector bundle of 
rank k with a Hermitian connection V. Set 


c(Q) := det ( + 
三 1 十 c1(@2) +- +e(Q) 


Again c;(Q) is independent of the local gauge and 
dc;(Q) 20. The de Rham cohomology class [c;(2)] 
represents ci(V) in H^(M;C). 


The Chern Character 


The total Chern character is defined by the formal 
sum 


ch(Q) := tr(eV- 18/27) 
xe Loses 
e à, (20)" V! med 
= cho(Q) + ch, (Q) + --- 
Let ch(V) =[ch(Q)] denote the associated de Rham 


cohomology class; it is independent of the particular 
connection chosen. We then have the relations 


ch(V & W) = ch(V) + ch(W) 
ch(V & W) = ch(V)ch( W) 


The Chern character extends to a ring isomorph- 
ism from KU(M)&à Q to H*(M;Q), which is a 
natural equivalence of functors; modulo torsion, 
K theory and cohomology are the same functors. 


Other Characteristic Classes 


The Chern character is defined by the exponential 
function. There are other characteristic classes 
which appear in the index theorem that are defined 
using other generating functions that appear in 
index theory. Let x:=(x,,...) be a collection of 
indeterminates. Let s,(x) be the vth elementary 
symmetric function; 


Ha + xy) = 14 si(x)4- s(x) 十:… 


p 


For a diagonal matrix A := diag(A1,...), denote the 
normalized eigenvalues by x; := V —1A;/27. Then 


c(A) = (i 2 =] 十 S1(X) +- 


Thus, the Chern class corresponds in a certain sense 
to the elementary symmetric functions. 

Let f(x) be a symmetric polynomial or more 
generally a formal power series which is symmetric. 
We can express f(x) — F(si(x),...) in terms of the 
elementary symmetric functions and define 
f(Q)—F(c1(Q),...) by substitution. For example, 
the Chern character is defined by the generating 
function 


k 
=r 》 e" 
v=1 


The Todd class is defined using a different 
generating function: 


s T Ts (1-67 
= 1 + tdi (x) + 


If V is a real vector bundle, we can define 
some additional characteristic classes similarly. Let 
{+V/—1A;,...} be the nonzero eigenvalues of a 
skew-symmetric matrix A. We set xj— —AXj/2m 
and define the Hirzebruch polynomial L and the A 
genus by 


Xy 
LA) = II tanh(x,,) 


= 1+ Li (x) + L2(x) + 

Xy 
A(9): = Ist) (1/2)5,) 
=1+ Ai(x) + Àz (x) + 


The generating functions 


x x 
tanh(x) 2 sinh((1/2)x) 
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are even functions of x, so the ambiguity in the 
choice of sign in the eigenvalues plays no role. This 
defines characteristic classes 


Lí(V)e H*(M;C) and A;(V)e H*(M;C) 


Summary of Formulas 


We summarize below some of the formulas in terms 
of characteristic classes: 


| v -Atr(Q) 
i r ? 


1 ) 
2. (Q)= ga (tri) — tr(Q)°}, 


1. c1(Q) 


1 
3. pi(Q) = -ga tr("), 


2 


4. ch(V)=k fa + EA 


V | C] | (ci 十 c2) C1C2 L.S (V 
2. tdl = 人 2 12 24 H ^ 
4 二 上， V 
6: M { 24 5760 H ^ 
5 pi Tat iy 
^ = 人 d RANT NÉ H ^ 


8. td(V @ W)=td(V)td(W), 
. A(VeW)=A(V)A(W), 
10. |L(V& W)-2L(V)L(W). 


The Euler Form 


So far, this article has dealt with the structure groups 
O(k) in the real setting and U(k) in the complex 
setting. There is one final characteristic class which 
arises from the structure group SO(k). Suppose k = 2n 
is even. While a real antisymmetric matrix A of shape 
2n x 2n cannot be diagonalized, it can be put in block 
off 2-diagonal form with blocks, 


0 A 
—-A 0 


The top Pontrjagin class p,(A) =x} -- -x 
square. The Euler class 


2 


“~ is a perfect 


Cial A) = x1 X3 
is the square root of p,. If V is an oriented vector 
bundle of dimension 2”, then 

ex (V) € H"(M;C) 


is a well-defined characteristic class satisfying 
ey (V) = Pn( V). 

If V is the underlying real oriented vector bundle 
of a complex vector bundle W, 


El V) = c4(W) 


If M is an even-dimensional manifold, let e,,(M) := 
€, ( TM). If we reverse the local orientation of M, 
then en(M) changes sign. Consequently, e,,(M) is a 
measure rather than an m-form; we can use the 
Riemannian measure on M to regard e,(M) as a 
scalar. Let Rj; be the components of the curvature of 
the Levi-Civita connection with respect to some local 
orthonormal frame field; we adopt the convention 
that R122; — 1 on the standard sphere $? in R3. If 
eh ;= (el, el) is the totally antisymmetric tensor, then 


£5, :一 : E JR; ij CRANE 
M -—— n 
NI (87) n! 


Let R := Rjj; and pj; := Rigg; be the scalar curvature 
and the Ricci tensor, respectively. Then 


1 
e? EM 
= 1 2 Pr 2 


Characteristic Classes of Principal 
Bundles 


Let q be the Lie algebra of a compact Lie group G. 
Let 7: P — M be a principal G bundle over M. For 
€ € P, let 


Ve :— ker m, : TgP > TM and He := Vz 


be the vertical and horizontal distributions of the 
projection 7, respectively. We assume that the metric 
on P is chosen to be G-invariant and such that 
7,: He — TreM is an isometry; thus, m is a Rieman- 
nian submersion. If F is a tangent vector field on M, 
let HF be the corresponding vertical lift. Let py be 
orthogonal projection on the distribution V. The 
curvature is defined by 


Q(F;, F3) = py[H(Fi), H(F2)] 


the horizontal distribution H is integrable if and only if 
the curvature vanishes. Since the metric is G-invariant, 
O(F;, F5) is invariant under the group action. We may 
use a local section s to P over a contractible coordinate 
chart O to split 7 'O=O x G. This permits us to 
identify V with TG and to regard 2 as a q-valued 
2-form. If we replace the section s by a section $, then 
Ü= gQg™! changes by the adjoint action of G on g. 
If V is a real or complex vector bundle over M, 
we can put a fiber metric on V to reduce the 
structure group to the orthogonal group O(r) in the 
real setting or the unitary group U(r) in the complex 
setting. Let Py be the associated frame bundle. A 
Riemannian connection V on V induces an invariant 
splitting of TPy —Y $4 and defines a natural 
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metric on Py; the curvature Q of the connection V 
defined here agrees with the definition previously. 

Let Q(G) be the algebra of all polynomials on 
q which are invariant under the adjoint action. If 
O € Q(G), then O(Q) is well defined. One has 
dO(Q) —0. Furthermore, the de Rham cohomology 
class Q(P) :— [O(Q))] is independent of the particular 
connection chosen. We have 


Q(U(R)) = Clei,..., cx] 

Q(SU(k)) = Clez, ... , Cr] 

Q(O(2&)) = Clpi,... ,px] 

Q(O(2k + 1)) = C[pi..... pu] 
Q(SO(2k)) = C[p1, ... pi, ex]/ek = Pr 

O(SO(2k + 1)) = C[p1,..., Pk] 


Thus, for this category of groups, no new character- 
istic classes ensue. Since the invariants are Lie- 
algebra theoretic in nature, 


Q(Spin(k)) = Q(SO(R)) 


Other groups, of course, give rise to different 
characteristic rings of invariants. 
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challenges for mathematicians. Most of the tremen- 
dous amount of mathematical activity generated by 
Witten's discovery has been concerned primarily with 
issues that arise after one has accepted the functional 
integral as a formal object. This has left, as an 
important challenge, the task of giving rigorous 
meaning to the functional integrals themselves and to 
rigorously derive their relation to topological invar- 
iants. The present article will discuss efforts to put the 
functional integral itself on a rigorous basis. 


Chern-Simons Functional Integrals 


We shall describe here the typical Chern-Simons 
functional integral. For the purposes of this article, 
we will confine ourselves to a simpler setting rather 
than the most general possible one. In fact, we shall 
work with fields over three-dimensional Euclidean 
space R? (instead of a general 3-manifold). 

The typical Chern-Simons functional integral is of 
the form 


| ellk/4m)Scs(A) Wo R (A)...We,r,(A)DA [1] 


Our objective in this section will be to specify what 
the terms in this formal integral mean. Very briefly, 
the integration is with respect to a formal “Lebesgue 
measure" on A, an infinite-dimensional space of 
geometric objects A called connections over R? with 
values in the Lie algebra LG of a group G. In the 
first term in the integrand, in the exponent, k is a 
real number, and Scs(A) is the Chern-Simons action 
for the connection A. Each term We, g(A) is a 
Wilson loop observable, the trace in some represen- 
tation R; of the holonomy of the connection A 
around the loop C;. The entire integral, formal 
though it may be, provides an invariant associated 
with the system of loops C1,..., Cn。 

Let G be a compact Lie group; for ease of 
exposition, let us take G to be a closed, connected 
subgroup of U(z). Thus, each element of G is an 
n x n complex matrix g with g'g — 1, the identity. 
The Lie algebra LG consists of all n x n matrices A 
which are skew-Hermitian, that is, satisfy A* — —A, 
and for which e^ € G for all real numbers t. On LG 
there is a convenient inner product given by 


(A, B) = tr(AB") 


This inner product is invariant under the conjuga- 
tion action of the group G on its Lie algebra LG. 

By a connection over R? we shall mean a Cr 
1-form with values in LG. The set of all connections 
is an affine (in our case, actually a linear) space .A. If 
A € A, then define 


Scs(A) =| tr(AAdA+4A AAA A) [2] 
R? 


This is, up to constant multiple, the Chern-Simons 
action functional. 

Let A be a connection and consider a piecewise 
smooth path 


C: [0,1] 2 R? 


With this one can associate a G-valued path [0,1] 一 
G:te g(t) € G satisfying the differential equation 


g(t)g(t) ! = —A(C'(t)) 
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subject to the initial condition g(0) — I, the identity. 
The path t> g(t) describes parallel transport along C 
by the connection A. If C is a loop then the final value 
g(1) is the holonomy of A around C. If R is a repre- 
sentation of G on some finite-dimensional vector space 
then the trace of R(g(1)) is the Wilson loop observable: 


Wer(A) = tr(R(g(1))) [3] 


Thus, we have specified the meaning of the terms 
appearing in the formal integral [1], where 
Ci;...,C, of eqn [1] form a link (a family of 
nonintersecting, imbedded loops) in R° and 
R;,..., R, are finite-dimensional representations of 
G. Witten showed that, at least for suitable values of 
k, integrals of this form ought to produce topologi- 
cal invariants, which he identified, for the link. 

The integral [1] is problematic for several reasons. 
First, there is no reasonable and useful analog of 
Lebesgue measure on an infinite-dimensional space. 
Even if one were to regularize this measure in some 
simple way, one would run into the problem that the 
measure would not live on the space of smooth 
connections, and so the integrand would become 
meaningless. 

There are several different approaches to a 
mathematical interpretation of [1]. The approach 
that is often taken in practice is to simply ignore the 
analytical problem and define the value of the 
integral [1] to be what Witten's calculations have 
given. One approach, used, for instance, by Bar- 
Natan (1995) is to expand the integrand in a series 
and relate each individual integral in this expansion 
separately to topological invariants. Discrete 
approximation procedures to the continuum integral 
have also been explored. In the abelian case, infinite- 
dimensional oscillatory integral techniques have 
been used to understand the functional integral. 
Frohlich and King (1999) showed the possibility of 
interpreting parallel transport using ideas from 
stochastic differential equations. Such an approach 
has been used successfully in the case of two- 
dimensional Yang-Mills theory, where the func- 
tional integral actually corresponds to integration 
with respect to a measure. In this article, we focus 
on a method of understanding the normalized 
Chern-Simons functional integral in terms 
of infinite-dimensional distribution theory and 
examining some ideas for understanding Wilson 
loop expectation values in this setting. 


Infinite Dimensional Distributions 


Let (x?,x!, x?) denote the usual coordinates on R?. 
Gauge symmetry, an issue which will not be 
examined here, may be used to simplify the problem 
of the Chern-Simons integral. In particular, one 
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need only focus on connections which vanish in the 
x?-direction, that is, connections of the form 
A — Aodx? + Aidx!. For such A, the triple wedge- 
product term in the Chern-Simons action disap- 
pears, and we are left with the quadratic expression: 


Scs(A) = I tr(A A dA) [4] 


This is good, since the functional integral now 
involves a quadratic exponent and so stands a good 
chance of rigorous realization, just as Gaussian 
measure can be given rigorous meaning in infinite 
dimensions. However, in the Chern-Simons situa- 
tion, there is no hope of actually getting a measure, 
not even a complex measure. 

The next best thing to a measure is a distribution 
or *generalized function." A distribution over a space 
Y is a continuous linear functional on a topological 
vector space of functions on Y. Thus, the objective is 
to realize the Chern-Simons functional integral as a 
continuous linear functional on some space of test 
functions over .A (more precisely, on an extension of 
A). Before turning to the specific case of the Chern- 
Simons integral, let us examine some elements of the 
theory of infinite-dimensional distributions, in as 
much as they are relevant to our needs. 

Let us consider a Hilbert space £o, and a positive 
Hilbert-Schmidt operator T on £o. For each integer 
p > 0, let Ep = T” (Eo), which is a Hilbert space with 
the inner product (x,y), — (T ?x, T ?y). Then we 
have the chain of inclusions 


E£e[ |& C --- £2 € £1 C £o [5] 
p>! 


with each inclusion £5,1— £y being Hilbert- 
Schmidt. Let E-p =E, be the topological dual of Ep, 
the space of continuous linear functionals on Ep, and 
let €’ be the topological dual of E, where the latter is 
given the topology generated by all the norms ||. ||,. 
Then we have the inclusions 


Wer eel [6 
p>0 


For each xEE there is the evaluation map 
x:£€ —R:ó ó(x). A very special case of a general 
theorem of Minlos guarantees that on the dual £’ there 
is a measure jz on the sigma algeba generated by all the 
functions X such that each £ is a Gaussian random 
variable of mean zero and variance lx|o5 that is, 


Í eit du = e f lxlo/2 
"d 


for all x €& and £t € R. This measure yp is the 
standard Gaussian measure on £' for the infinite- 
dimensional nuclear space £. 


The inner products (-, -), give rise to a nuclear space 
structure on function spaces over €. Let U be the 
algebra of functions on £' generated by the exponen- 
tials eò, with x running over £ and A over C. For each 
p > 0, there is an inner product ((-,-)),, on U such that 


(eei. ubi = eM, qm 
p 

For p — 0 the left-hand side coincides with the L?(ji) 
inner product. Let [€], be the Hilbert space 
completion of U in the ((-, )), inner product. Then 


[E c [Eb C Eh C (El = (Ea) [8 


Let [£] = 05» o [E],, equipped with topology from all 
the norms |.|,, and [£] its topological dual. 
Elements of [£]', being continuous linear functionals 
on the “test function space" [£], are called distribu- 
tions over £, in the language of white-noise analysis. 

A fundamental tool in the study of infinite- 
dimensional distributions is the S-transform. This 
generalizes the traditional Segal-Bargmann trans- 
form from the L?-setting to the context of distribu- 
tions. Let E. be the complexification of E. The inner 
product (-,-)) on E extends to a complex-bilinear 
pairing E. x £;, — C:(z,w)—z-w. The evaluation 
pairing € x £—R also extends naturally to the 
complexifications. For ® a distribution belonging to 
[E], define a function S® on € by 


S®(z) = (cz) 


for all z € €.. Here c, is the coherent state function on 
€ given by c,(ó) —e?«)-(0/22«. A fundamental and 
useful result in white-noise analysis, due originally to 
Potthoff and Streit, specifies the range of the transform 
S and allows reconstruction of a distribution ® from 
the function $4. Briefly, the range of S$ consists of 
functions which are holomorphic, in an appropriate 
sense, and have at most quadratic exponential growth. 
In particular, this theorem implies that a function of the 
form z — e77, for any constant a, is in the range of ®. 


Rigorous Realization of Chern-Simons 
Integrals 


We return to the Chern-Simons context. As men- 
tioned earlier, gauge symmetry may be invoked to 
reduce the space of connections to the smaller space: 


£-XoX [9] 


where X—S(R?)& LG is the space of rapidly 
decreasing functions with values in the Lie algebra 


LG. Let 
= 
d x 
n= Ci) 


as a linear operator on L?(R?), T; = Te & I the 
induced operator on L? (R?) @ LG, and T= T; @ T». 
Then, as described in the preceding section, we have 
the space € and its dual £'. There is then the 
standard Gaussian measure p on £', and the space 
[E] of distributions over £'. 

The normalized Chern-Simons integral may be 
viewed as a linear functional 


Pos : F5 x | ellk/4m)Scs(A)F(A)DA — [10] 
E 


where N is a “normalizing” factor. Rigorous mean- 
ing can be given to this by first formally working out 
what the S-transform of ®cs ought to be. Calcula- 
tion shows that $6 is indeed a holomorphic function 
on £, of quadratic growth. The Potthoff-Streit 
theorem then implies that ®cs does exist as a 
distribution in the space [E]. Let us examine this 
in some more detail. 

As before, we take A to be of the form 
A = Agdx? + A;dx!, with the component A? equal 
to 0. Integration by parts shows that 


k k 
4; 5cs(A) T m 


tr(A905 A1) dvol [11] 
25 Jg? 


A formal computation reveals that S(®cs)(j) should 
be given by 


fs | v 34. 
exp (z tr (joð; i) [12] 
where j= (Jo, ]1), and 


| 1 
8; f (x) 3 5 | dtt = 1 x5,00) (S)] f(x x^.) 


The Potthoff-Streit criterion implies the existence of 
a distribution cs, whose S-transform is given by the 
above expression. 

The distribution ®cs is, however, not a suffi- 
ciently powerful object to allow determination of 
the Wilson loop expectations that one would really 
like to have. For instance, Pcs does not live on the 
space of smooth connections and so the meaning of 
parallel transport needs to be defined. The state of 
knowledge, at the rigorous level, at this point is still 
evolving, with progress reported by A. Hahn. We 
describe some ideas for the Wilson loop expecta- 
tions in the following. 

The strategy for defining parallel transport along 
a path is to smear out the path by means of bump 
functions and essentially replace the path by a path 
of test functions in €. The description given here is 
mainly for the case of abelian G. Choose first a C™ 
non-negative bump function ~ on R?, vanishing 
outside the unit ball and having L! norm equal to 1. 
For € > 0, let yf be the scaled bump function given 
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by w(x)=e w(x/e). Next, for a smooth loop 
[0, 1] — L(t) = (lo(t), h (t), l2(t)), let I(t) 2 v*(- Kt), 
the scaled bump function centered now at the path 
point l(t). Now consider a generalized connection 
A = (Ao, A1) € £'. Set 


BA(t) = Ao(l'())(t)g + Ax((0)) (0) [13] 


The equation of parallel transport can be reformu- 
lated as a differential equation for a matrix-valued 
path t — P% (t) satisfying 

d € € € 

a; PA (t) + BA (t)PA (f) = 0 [14] 
and the initial condition P! (t) — [. With this smear- 
ing, one can consider functions of the form 


W.(L:A) = ] [wt t) as 
pz 


for a link L consisting of loops /;,...,1,, instead of 
the classical Wilson loop variable. 

At this stage, it would be natural to consider 
taking el0 in ®(W,(L)). However, this is still 
problematic. A further regularization is needed, 
roughly corresponding to the geometric notion of 
framing. In the definition of ®cs, alteration is made 
to the quadratic form Q(j, j) in the exponent which 
appears in the expression for S(®cs), replacing it 
with O(j, tj), where {¢,},.9 is a family of suitable 
diffeomorphisms of R?, with $ being the identity. 
In a sense, this splits a single loop / into / and a 
neighboring loop ¢, o]. At the end, one has to take 
s|0. The resulting limiting value is the expected 
link-invariant. We shall not go into the case of 
nonabelian G, which is more complex, for which 
work continues to be in progress. 

Infinite-dimensional distributions can be used to 
formulate a rigorous theory for normalized Chern- 
Simons functional integrals. The more specific ques- 
tions raised by the Wilson-loop integrals in this setting 
opens up new problems for further developments in 
the distribution theory, connecting geometry, topol- 
ogy, and infinite-dimensional analysis. 
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Classical groups are Lie groups corresponding to 
three classical geometries — linear, metric, and 
symplectic. Let us start with the complex field C. 
We consider the linear space C" and the group 
GL(m;C) of its automorphisms — nondegenerate 
(invertible) linear transformations. The complex 
linear metric space is the space C” endowed by a 
nondegenerate symmetric bilinear form; the orthogo- 
nal group O(n;C) is the subgroup in GL(r; C) of 
automorphisms of this structure. If, for » — 2], we 
replace the symmetric form by a nondegenerate skew- 
symmetric form, we obtain the linear symplectic 
space and the group Sp(/; C) of its automorphisms — 
the symplectic group. 

A fundamental observation of nineteenth century 
geometry was that the transfer from the complex 
field to the real one, gives not only three corres- 
ponding groups for R but a much reacher collection 
of real forms of complex classical groups: unitary, 
pseudounitary, pseudoorthogonal, etc. (see below). 
Classical geometries correspond to homogeneous 
manifolds with classical groups of transformations. 
Geometers understood that this produces a very 
reach world of non-Euclidean geometries, including 
the first example of non-Euclidean geometry — 
hyperbolic geometry. Some classical algebraic the- 
ories through such an approach obtain a geometrical 


interpretation (see below the consideration of the 
cone of symmetric positive forms). Between classical 
manifolds there are Minkowski space, Grassman- 
nians, and multidimensional analogs of the disk and 
the half-plane. A substantial part of this theory is a 
matrix geometry, which serves as a background for 
matrix analysis. A rich geometry on classical 
manifolds with many symmetries is a background 
for a rich multidimensional analysis with many 
explicit formulas. Classical geometries, starting with 
Minkowski geometry, have appeared in some 
problems of mathematical physics. 

A crucial technical fact is the embedding of the 
classical groups in the class of semisimple Lie groups; 
it gives a very strong unified method to work with 
semisimple groups and corresponding geometries — the 
method of roots. Nevertheless, some special realiza- 
tions and constructions for classical groups can also be 
very useful. A very impressive example is the twistors 
of Penrose, where an initial construction is the 
realization of points of four-dimensional Minkowski 
space as lines in three-dimensional complex projective 
space. We mention below some general facts about 
semisimple groups and homogeneous manifolds, but 
the focus will be on special possibilities for the classical 
groups. The class of simple Lie groups contains, 
besides the classical groups, only a finite number of 
exceptional groups which are also very interesting and 
are connected, in particular, with noncommutative 
and nonassociative geometries; they have applications 
to mathematical physics. 


Complex Groups and Homogeneous 
Manifolds 


Complex Classical Groups 


The complete linear group GL(z; C) is the group of 
nongenerate matrices g of order n (det g 0) and the 
special linear group SL(z;C) is its subgroup of 
matrices with the determinant equal 1 (unimodular 
condition). The unimodular condition kills the one- 
dimensional center, perhaps, leaving only a finite 
center. We realize the direct products of several copies 
of complete linear groups with different dimensions, 
for example, GL(k; C) x GL(/; C), as the groups of the 
blockdiagonal nondegenerate matrices. The letter S 
always means that we take matrices with determinant 
1. So the notation S(L(&; C) x L(I; C)) means that we 
take blockdiagonal matrices with blocks of sizes k, | 
and with the determinant 1. 

Let I be a nondegenerate symmetric matrix of 
order n; then the orthogonal group O(z;C) is the 
subgroup in GL(z;C) of matrices preserving the 
corresponding symmetric form so that 


g Ig—I 


These matrices can have the determinant +1. The 
special orthogonal group SO(n; C) is the subgroup 
of orthogonal matrices with determinant 1. Differ- 
ent Ps give isomorphic orthogonal groups since they 
are all linearly equivalent. If we take as I the unit 
matrix E—E,, then we receive the group of 
orthogonal matrices in the classical sense: g' g= E. 

If 1 —2/ and we replace in this definition the 
symmetric matrix I by a nondegenerate skew- 
symmetric matrix /, we obtain the symplectic 
group Sp(/; C). Again, different Ps give isomorphic 
groups. The typical example of J is 


0 E 
(5 j 


It is convenient then to represent matrices g as 


-(4 3) 


where the blocks A, B, C, D are matrices of order I. 
Then the symplectic condition is that A'D— 
C' A =E and matrices A! C and D ! B are symmetric. 
If C=0 then D=(A')* and A^B is a symmetric 
matrix. In this way, we have in Sp(/; C) a subgroup 
P of blocktriangular matrices of a very simple 
structure; it is an example of subgroups which are 
called parabolic. 

There are two principal classes of homogeneous 
spaces with complex semisimple Lie groups: flag 
manifolds and Stein manifolds. 
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Flag Manifolds 


These homogeneous spaces F— G/P with semi- 
simple (in our case with classical) groups G have 
parabolic subgroups P as the isotropy subgroups. 
The group G=GL(n;C) transitively acts on the 
flag manifolds F(m,...,m)!0«mi <- <n, <n, 
whose elements are (7,...,71,)-flags — sequences of 
embedded subspaces in C" of the dimensions 
(n1,...,n,). The isotropy subgroup P = P(n,...,7,) 
is the subgroup of blocktriangle matrices with the 
diagonal blocks of sizes ki,..., Rkr+i, R;= (nj; — 
nj-1), no =0,,41 =n. The flag manifolds are com- 
pact complex manifolds. The matrices proportional 
to the unit matrix E, act trivialy and we can 
consider instead of the action of G=GL(n; C) the 
transitive action of G = SL(z; C). 

Let us pay particular attention to two extremal 
cases. The first one is the case of the maximal 
flag manifold when we have the sequence of 
all integers (1,2,3,...,2— 1) - complete flags; the 
subgroup P in this case is called Borelian. Another 
case is minimal flag manifolds with r — 1 (for them 
the unipotent radical of the parabolic subgroups is 
commutative). Then in the case of SL(m;C) the 
sequence has only one element nı =k <n and we 
have Grassmannian manifolds Grc(k;n)= F(k) of 
k-dimensional subspaces in C". If k — 1 or k=n— 1, 
we obtain the dual realizations of the complex 
projective space CP"^!. We can interpret points 
of Grc(k;n) also as (k — 1)-dimensional planes in 
Cp"-! 

We can define points of the projective space 
CP" by homogéneous coordinates — as the 
equivalency classes (z ~ cz,z € C" \ {0},c € CX 0). 
For the Grassmannians we can similarly use matrix 
homogeneous coordinates (Stiefel's coordinates): 
classes of (k x m)-matrices Z € Mat(k,») of the 
maximal rank k relative to the equivalency 


Zr-uZ, ue GL(k;C) 


The rows of a matrix Z correspond to a base in 
subspace with the homogeneous coordinate Z; the 
left multiplication on a matrix u replaces this base, 
but does not change the subspace. The group 
GL(z; C) acts by right multiplications: 


Z+Zg 


and this action preserves the equivalency classes. 
Suppose k < n — k and the left k-minor of Z is not 
zero. Such matrices give the dense coordinate chart 
CWE), we can pick in the equivalency classes the 
representatives (E,,z),z € Mat(k,» — k), and con- 
sider the matrices z as (inhomogeneous) local 
coordinates. In the inhomogeneous coordinates the 
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action of the group has a matrix fractional linear 
form: let 


A B 
a p) 
A € Mat(k), D € Mat(n — k), 
B € Mat(k,n — k), C € Mat(n — k, k) 


Then we have the transformation in inhomogeneous 
coordinates: 


ze (A--zC) !(B 4 zD) 


The condition C=0 defines the parabolic sub- 
group which has affine action in inhomogeneous 
coordinates which is transitive in the coordinate 
chart. In such a way the Grassmannian is a 
compactification of C'"-9 (realized as a space of 
k x (n — k) matrices). If n= 2h, we can consider it as 
the compactification of the space of square matrices 
z of order k with the flat generalized conformal 
structure defined by translations of the isotropy cone 
(det z= 0]. 

There are similar constructións of flag manifolds 
for other classical groups. We will consider only the 
minimal flag manifolds. For O(2k;C) we consider 
the isotropic Grassmannian Grr.(2k;C) of isotropic 
k-subspaces relative to the symmetric form I. We 
take the matrix realization of Gre(k;2k), using 
Stiefel’s homogeneous coordinates, and add the 
matrix equation 


ZIZ! 20 


which is well defined in the homogeneous coordi- 
nates (compatible with the equivalency classes) and 
defines isotropic subspaces relative to I. This matrix 
cone is preserved by the subgroup O(2k;C) C 
GL(2k;C) corresponding to the matrix I. If we 
take the symmetric matrix 


then in inhomogeneous coordinates (z is a square 
k-matrix) this equation is transformed into the 
condition that the matrix z is skew-symmetric. So, 
in a natural sense, the isotropic Grassmannian is 
the compactification of the linear space of skew- 
symmetric matrices Alt(k) 2 CN, N — k(k — 1)/2. 

A similar construction makes sense for the 
symplectic group: if we replace the symmetric form 
I with the skew-symmetric form /, we obtain the 
equation of the matrix cone representing the 
Lagrangian Grassmannian Grf:(k; 2k) of Lagrangian 
subspaces in 2k-dimensional linear symplectic space. 
If we were to choose J as above, then in the 


(inhomogeneous) coordinate chart we obtain the 
condition that the matrix z is symmetric. Thus, we 
have the (dense) coordinate chart on the Lagrangian 
Grassmannian C^ = Sym(k), N=k(k+1)/2 — the 
linear space of symmetric matrices. 

There is one more type of minimal flag manifolds 
for the orthogonal group SO(n; C) - the quadric Q 
in the projective space: 


I(z) =zIz' =0 


where rows z € C”\{0} represent, in homogeneous 
coordinates, points in CP"-!, If T= E, we have the 
equation (31) +--+ + (Za) —0. This quadric is the 
complex compact conformal flat manifold 
CCN,N —n —2; it is the compactification of CF 
endowed with the flat conformal structure corre- 
sponding to the quadratic isotropic cone. The 
parabolic group is generated by linear conformal 
transformations and translations. On the quadric O 
the conformal structure is defined by intersections of 
tangent spaces with O. Apparently, this structure is 
invariant relative to the natural action of SO(n; C). 


Classical Stein Manifolds 


Such homogeneous complex manifolds X = G/H have 
complex reductive isotropy subgroups H. Contrary to 
the flag manifolds which are compact, these manifolds 
are Stein ones and there are many holomorphic 
functions on them. The typical examples for 
G=GL(n;C) are homogeneous spaces S(hi,..., 
koi) n— hor kr41, for which the isotropy sub- 
groups are blockdiagonal matrices with the blocks of 
sizes kj,...,R,4,. Then points of the manifold can be 
realized as generic sets of subspaces L; C C", 
dim L; = kj, 1 <j € r - 1 or, what is equivalent, gen- 
eric sets of (k; — 1)-dimensional planes in CP”. Since 
the isotropy subgroup of such a homogeneous space is a 
subgroup of the parabolic subgroup P(5,...,7,), 
k; =n; — nj 1, we have the natural fibering S(Ài,..., 
kr41) 一 F(my,...,n,) (it is simple to see this geo- 
metrically: the ith subspace of a flag in the base is the 
direct sum of first i subspaces representing a point in 
the fiber) This is a convenient tool to apply 
complex analysis on $ to the compact manifold F 
where there are no nontrivial holomorphic functions. 
Let us emphasize that such a connection exists only 
for special classes of classical Stein manifolds. 

Let us pay special attention to the subclass of 
symmetric Stein manifolds. For such manifolds X, the 
isotropy subgroup H is fixed relative to a holomorphic 
involutive automorphism of G. Complex semisimple 
Lie groups G (including classical ones) are symmetric 
Stein manifolds relative to the action of their square 
G x G by left and right multiplications. 


Classical Stein manifolds for SL(»; C) considered 
above are symmetric if r—1 and we have the 
manifold of pairs of subspaces of complimentary 
dimensions intersecting only on {0}. The simplest 
example is the manifold of pairs of different points 
of the projective line CP’. Let us point out again 
that the transition to the generic pairs of points 
transforms the compact complex manifold without 
nonconstant holomorphic functions into a Stein 
manifold with a large collection of holomorphic 
functions. 

Some other examples of symmetric Stein mani- 
folds are connected with classical geometry and 
linear algebra. The affine hyperboloid in C", 


O(z)=1 


is a symmetric space for G = O(n; C), H = O(n — 1; C). 
We can compare it with the projective quadric 
O(z)=0 which is a minimal flag manifold. Let us 
remark that there is a duality here: it is possible to 
interpret points of the hyperboloid of dimension z 
as generic hyperplane sections of the projective 
quadric of dimension 7 — 1. 

The space X of complex symmetric matrices of 
order n with determinant 1 is symmetric for the 
group SL(m;C) which acts by the changes of 
variables in the corresponding quadratic forms: 


ze g' zg, g € SL(n; C) 


The transitive action reflects the possibility of 
transforming such a form into a sum of squares. 
The isotropy subgroup is SO(n; C). 

The Stein symmetric manifold X —SO(»;C)/ 
S(O(k; C) x O(n — k; C)) is realized as the manifold 
of k-dimensional subspaces in C" on which the 
restriction of the principal symmetric form I is 
nondegenerate. 


Isomorphisms in Small Dimensions 


Isomorphisms of classical groups in small dimen- 
sions produce isomorphisms of some classical 
homogeneous manifolds. Such isomorphisms were 
very important in the history of geometry; below are 
a few examples. We will consider local isomorph- 
isms (up to a finite center). We have SL(2; C) = 
SO(3; C). Let us realize C? as the space of symmetric 
matrices z of order 2. Then, as we remarked above, 
the two-dimensional submanifold X of matrices 
with determinant 1 is the symmetric Stein manifold 
for the group SL(2; C). On the other hand, we can 
take detz as the quadratic symmetric form I in C?; 
then X is the hyperboloid for this form and the 
action of SL(2; C) on symmetric matrices gives the 
orthogonal transformations relative to this form 1. 
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Similarly, we can interpret the local isomorphism 
SO(4; C) & SL(2; C) x SL(2; C). We realize C* as the 
space of square matrices z of order 2 with the 
symmetric quadratic form I(z,z)= det(z). Then left 
and right multiplications of z on unimodular 
matrices (z — uzv,u,v € SL(2; C)) induce orthogonal 
transforms for the form I and any orthogonal 
transform can be represented in such a form (one 
can see it by the calculation of dimensions). 

The local isomorphism SL(4; C) = SO(6; C) has a 
slightly more complicated nature. Let us consider the 
Grassmannian Grc(2;4) of lines in the projective 
space CP? with 2 x 4 matrices Z as matrix homo- 
geneous coordinates. Let pj,i < j, be the minors of Z 
with ith and jth columns. They are called Plücker 
coordinates on Grc(2; 4): the equivalency class of 
Z is defined by the sequence of six numbers 
p=(pi,1 < i<j €J) x (0,...,0) up to a constant 
factor. Thus, we have an imbedding of Grc(2; 4) in the 
projective space CP?. The image will be the quadric 


P12P34 — P13P24 + P14p24 = 0 


Thus, we have the isomorphism of two flag manifolds 
and the action of SL(4;C) on the Grassmannian 
transforms in orthogonal transformations of four- 
dimensional quadric in CP?. The Plücker coordinates 
can be defined for any Grassmannian, but they do not 
produce in other cases some isomorphisms with other 
flag manifolds; nevertheless, they realize them as 
intersections of quadrics in projective spaces. 


Compact Classical 
Homogeneous Manifolds 


Compact classical groups U(z), SU(m), O(n), SO(n), 
Sp(/) are maximal compact subgroups in the corre- 
sponding classical complex groups GL(n; C), SL(n; C), 
O(n; C), SO(n; C), Sp(/; C). This condition defines 
them up to an isomorphism. They are fixed subgroups 
of some antiholomorphic involutive automorphisms. 
The unitary groups U(z) and SU(z) are the groups 
of unitary matrices (g'g— E,) correspondingly, of 
unitary matrices with determinant 1. As the compact 
orthogonal group we can take the intersection U(7z) N 
O(n; C). For the standard form J, it will be the group of 
real orthogonal matrices: g ! g= E (so the involution in 
O(n; C) is the conjugation g — g). Similarly, we can 
take Sp(/) =SU(2/) N Sp(I; C) (then the involution is 
ge —Jgl 

Compact classical groups act on compact homo- 
geneous Riemann manifolds. There are two mech- 
anisms connecting compact and complex 
homogeneous manifolds. We observe the first 
possibility in the case of flag manifolds which are 
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compact. We considered them so far relative to the 
action of complex (noncompact) groups. It turns out 
that on the flag manifold F— G/P the maximal 
compact subgroup U C G continues to be transitive: 
so we can consider flag manifolds also as being 
homogeneous with compact groups. Then F— U/C, 
where C is the centralizer of a torus in U. There is a 
Kahler metric on F, invariant relative to U. Thus, G 
is the group of all automorphisms of F as the 
complex manifold, but U is the group of its 
automorphisms as the Kahler manifold. It defines 
two sides of geometry of flag manifolds: complex 
and Kahler. Flag manifolds are the only compact 
homogeneous Kahler manifolds with semisimple Lie 
groups (the class of all compact Kahler manifolds 
also contains locally flat compact manifolds — 
toruses). In the example considered above we have 
F(ni,...,n,) =SU(n)/S(U(Ro) x --- U(R,)). In the lan- 
guage of Stiefel (homogeneous) coordinates, we fix a 
positive Hermitian form in C" and characterize 
subspaces by orthonormal bases. For r=1 we have 
Grassmannians Grç(k;n), in particular the projec- 
tive space CP"! which we consider relative to the 
action of the unitary groups. Relative to this action 
they are Hermitian symmetric spaces. In the case of 
minimal flag manifolds for other groups the action 
of maximal compact subgroups also defines on them 
the structure of compact Hermitian symmetric 
spaces. Let us emphasize that relative to noncom- 
pact groups of biholomorphic automorphisms G, 
the minimal flag manifolds (including the Grass- 
mannians) are not symmetric. 

In the case of homogeneous Stein manifolds 
X=G/H, the picture is different: the maximal 
compact subgroups have no open orbits. There are 
totally real orbits which are the compact forms of 
X: XR = Gg/Hg, where Gg and Hg are compact 
forms of G and H, respectively. It is the canonical 
embedding of compact homogeneous manifolds 
in their complexifications. The important special 
case is the embedding of compact symmetric 
manifolds in the Stein symmetric manifolds — their 
complexifications. 

For compact symmetric manifolds X — U/K the 
groups U,K are compact Lie groups and elements 
of K are fixed for an involutive automorphism o 
such that K contains the connected component of 
the subgroup of all fixed elements of o. This 
possibility to connect several symmetric manifolds 
with one involution is illustrated by the next 
example. The sphere S"! c R” is the symmetric 
space SO(»n)/SO(n — 1); the real projective space 
RP"! is SO(n)/O(n — 1). Here SO(n — 1) is the 
connected component of O(n — 1) and S"! is a 
double covering of RP”~'. A few more examples, the 


real Grassmannian Grr(k;n) of k-subspaces in R” 
can be defined as SO(n)/S(O(k) x O(n — k)). This 
representation corresponds to the characterization 
of subspaces by orthonormal bases. The considera- 
tion of arbitrary bases defines the action of the 
larger group GL(»; R) on Grr(k;n). Relative to this 
action, the real Grassmannian is not symmetric since 
the isotropy subgroup is parabolic and is not 
involutive. Such a possibility to extend the group is 
typical for a class of compact symmetric manifolds 
called symmetric R-spaces. They are real forms of 
Hermitian compact symmetric manifolds (minimal 
flag manifolds). Let us also mention compact 
symmetric spaces SU(n)/SO(n), which is the compact 
form of the space of unimodular symmetric matrices 
and can be presented by the submanifold of unitary 
matrices in it. Also, all compact Lie groups G are 
symmetric spaces relative to the action of G x G. 


Noncompact Riemannian 
Symmetric Manifolds 


This class of symmetric manifolds has the strongest 
connections with classical mathematics. Let us 
consider noncompact real semisimple Lie groups - 
real forms of complex semisimple Lie groups. They 
correspond to antiholomorphic involutions in com- 
plex groups. 

Between real forms of SL(C, n) there are real and 
quaternionic unimodular groups SL(R, 7), SL(H, n) 
and pseudounitary groups SU(p,q) of complex 
matrices preserving a Hermitian form H of the 
signature (p,q). The complex orthogonal group has 
as real forms, in particular, pseudoorthogonal 
groups SO(p,q) of real matrices preserving a 
quadratic form of the signature (p, q). 

Let G be a real simple Lie group and K be its 
maximal compact subgroup. Then X=G/K is a 
Riemann symmetric manifold of noncompact type; 
K is defined by an involutive automorphism of G. 
Therefore, in irreducible situation there is a corre- 
spondence between noncompact Riemann sym- 
metric manifolds and real simple noncompact Lie 
groups. K-orbits on X are parametrized by points of 
the orbit on X of a maximal abelian subgroup A — 
the Cartan subgroup of the symmetric space X. Its 
dimension / is the important invariant of X — its 
rank. The algebraic base for geometry of X is the 
Iwasawa decomposition 


G = KAN 


where N is a maximal unipotent subgroup (in a 
natural sense compatible with A). Then the para- 
bolic subgroup P — AN is transitive on X. 


Symmetric Cones 


Let us start with X = GL(z, R)/O(z). This manifold 
corresponds to the classical theory of quadratic 
forms: X can be realized as the manifold Sym, (n) of 
symmetric positive matrices x 0 of order n 
(corresponding to positive quadratic forms). Then 
the transitivity of GL(n; R) on X corresponds to the 
possibility to transform positive forms to a sum of 
squares. The sufficiency of triangle matrices for such 
transformations corresponds to the transitivity on 
X — Sym, (n) of the parabolic subgroup P of (upper) 
triangle matrices with positive diagonal elements. So 
A is the group of diagonal matrices with positive 
elements and the submanifold of diagonal matrices 
in X parametrizes K-orbits. The general fact about 
A-parametrization in this example is the classical 
fact about the reduction of quadratic forms to 
diagonal form by orthogonal transformations. 

There are complex and quaternionic versions 
of this picture. The symmetric manifold 
X=GL(n;C)/U(n) is realized as the manifold 
Herm,(z) of positive complex Hermitian matrices 
(forms) and similarly classical facts of linear algebra 
on Hermitian quadratic forms are transformed into 
geometrical statements on symmetric spaces. Let us 
emphasize that we consider here the group GL(n; C) 
as the real group. The same situation exists with the 
manifold Herm,(H,») of positive quaternionic 
Hermitian matrices, which is the symmetric mani- 
fold for the real group GL(n; H). 

These three manifolds can be included in an 
impressive geometrical structure. They all are con- 
vex homogeneous cones V in linear spaces R which 
are self-dual (V — V*) relative to a bilinear form 
(- ,-). Let us recall that 


V* = {x; (x,y) > 0,y € V \ 0} 


Here V is the closure of V. So these three symmetric 
manifolds are linear homogeneous self-dual cones. 

There is only one more type of classical homo- 
geneous self-dual cones — quadratic (Lorentzian) 
cones 


Ly, = {x ER"; x? — 23 —---—x2,, >'0,21 > 0} 


which is also called the future light cone (the 
condition x; < 0 defines the past light cone). The 
group of linear automorphisms of this cone is 
SO(1, 7) x R^; the first factor is the Lorentz group. 

There is also one exceptional 27-dimensional 
cone; it is possible to interpret this cone as the 
cone of positive Hermitian matrices of third order 
over Cayley numbers. There is a very nice structural 
theory of homogeneous self-dual cones; it is con- 
venient to develop this theory in the language of 


Classical Groups and Homogeneous Spaces 505 


Jordan algebras (Faraut and Koranyi 1994). Such 
cones participate as elements of explicit construc- 
tions of other classes of symmetric spaces (see 
below). 

Following Siegel, it is possible to connect with 
homogeneous self-dual cones multidimensional ver- 
sions of Euler integrals (T- and B-functions) (Faraut 
and Koranyi 1994). They have many applications, 
including those to integral formulas for complex 
symmetric domains. 


Riemann Symmetric Manifolds of Rank 1 


The first example of non-Euclidean geometry is 
connected with the Riemann symmetric manifolds of 
rank 1 — hyperbolic spaces; X = SO(1,)/O(n) is the 
hyperbolic space of dimension n. It can be realized 
as the upper sheet of the two-sheeted hyperboloid: 


2 2 2 
xx —.-.-x2=1,x9 0 


Pseudoorthogonal linear transformations from 
SO(1,7) preserve this surface; they play the role of 
hyperbolic motions. The equivalent realization is in 
the real ball x?—---—x2<1 relative to the 
projective transformations preserving this ball. 

Another example of a Riemann symmetric mani- 
fold of rank 1 is the complex hyperbolic symmetric 
space X — SU(1; 1)/U(). It can similarly be realized 
either as the hyperboloid 


ko — lal? —--- — lg, ]^ 51 


in C"*! relative to pseudounitary linear transforma- 
tions or as the complex ball |z +++: + |z,^ « 1 
relative to complex projective transformations pre- 
serving it. There are also quaternionic hyperbolic 
spaces which are realized as the quaternionic balls in 
the quaternionic projective spaces. These three series 
exhaust all classical Riemann symmetric manifolds 
of rank 1 of noncompact type. There is only one 
exceptional symmetric manifold of rank 1: it has the 
dimension 16 and can be interpreted as a two- 
dimensional ball for Cayley numbers. 


Classical Symmetric Domains in (^ 
(Cartan Domains) 


Riemann symmetric manifolds of noncompact type 
which admit an invariant complex structure also 
have an invariant Hermitian form corresponding to 
the Riemann metrics. For this reason, we will call 
them noncompact Hermitian symmetric manifolds 
(we considered above the compact Hermitian sym- 
metric manifolds). They are Stein manifolds, but as 
opposed to symmetric Stein manifolds, which we 
considered above, they are homogeneous relative to 
real groups. The condition for a Riemann symmetric 
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manifold X — G/K to be Hermitian is that K has an 
one-dimensional center. All Hermitian symmetric 
manifolds of noncompact type can be realized as 
bounded domains in C" (but, of course, not all their 
holomorphic automorphisms extend in C"). In the 
case of classical manifolds, these domains are called 
Cartan's domains: Cartan gave their explicit matrix 
realizations. 

The nature of groups of holomorphic automorph- 
isms of symmetric domains X—G/KCc CN is 
explained by Cartan's duality. Each such domain 
(Hermitian symmetric manifold of noncompact 
type) admits an embedding in a Hermitian sym- 
metric manifold of compact type Xc such that the 
complexification Gc of G is the group of holo- 
morphic automorphisms of Xc (correspondingly, 
D is an open G-orbit on Xc). Moreover, X lies 
inside a (Zariski open) coordinate chart C", which 
is an orbit of a parabolic subgroup. 

The simplest example is the complex ball CB" 
(complex hyperbolic space) imbedded in the com- 
plex projective space CP". The affine chart C" is the 
orbit of the parabolic subgroup of affine transfor- 
mations. Let us consider more complicated 
examples. 

Let Xc be the Grassmannian Grc(k;),g=n — 
k > p; we will use matrix homogeneous coordinates 
Z — kx n matrices — for the description of the 
symmetric domain. Then Gc — SL(z; C). Let us take 
its real form G=SU(k;q),k+q=n. We fix a 
Hermitian form H of the signature (k, 4) and realize 
G as the group of matrices preserving H: 


gHg' =H 


Then X = X, , =SU(k, q)/S(U(k) x U(q)) can be rea- 
lized as the domain in the Grassmannian 


ZHZ »0 


so that this Hermitian matrix of order k must be 
positive. It is essential that this condition is invariant 
relative to multiplications of Z on nondegenerate 
matrices 4 on the left and, therefore, it is a well- 
defined condition in homogeneous coordinates. 

Let us specify the choice of H: 


Then the corresponding domain X, is defined in 
inhomogeneous coordinates Z = (Ej, z),z € Mat(k, q), 
by the condition 

E,—zz 0 


This matrix ball lies completely in the coordinate 
chart C^4, Its rank is equal to min (k, q). Thus, we 


have the realization of this Hermitian symmetric 
space as a bounded domain in C, N — kq. In the 
case k= 1, we have the usual (scalar) complex ball. 
Let us remark that the edge of the boundary 
(Shilov's boundary) is the compact symmetric space 


2$ = E, 


with the group of automorphisms S(U(k) x U(4)) 
(the isotropy subgroup of X). For k=q the edge 
coincides with the set of unitary matrix U(R). 
Different forms H of the signature (k,q) are 
linearly equivalent and they correspond to different 
(biholomorphically equivalent) realizations of this 
Hermitian symmetric spaces. Let us, in the beginning, 
set k =q; the inhomogeneous matrix coordinates are 
square matrices of order k. Let us take the form 


f 0 iE, 
o ip, ~4 


Then, in inhomogeneous matrix coordinates, we 
have the domain X»: 


1 
7-2) > 0 


(complex matrices with positive skew-Hermitian 
parts). This domain (but not its boundary) lies in 
the chart. It has the structure of the tube domain 
T — R* +iV,n=k?, corresponding to the symmetric 
cone of positive Hermitian matrices (we take the 
space of such matrices as a real form of C"). The 
group of affine transformations of the tube domain: 


z = uzu* +a, u € GL(k;C),a e Herm(k) 


is transitive on X3; it is the parabolic subgroup in 
SU(k, q). 

The biholomorphic equivalency of the realizations 
of X corresponding to different H is induced by the 
equivalency of these forms. We have 

, V27 E,  —iE, 

ns ae (SB) 

Then the transform Z — ZA transforms X; in X4. In 

inhomogeneous coordinates it is the fractional linear 
matrix transform 


z — i(z +iE,) ! (z — iE,) 


It is the matrix version of the classical Cayley transform. 
Similarly, we can write down the inverse transform. 

If q Æ k, then there is also an analog of the tube 
realization. Let r=q — k > 0 and 


0 iE, 0 
Ho-|-iEL. 0 0 
0 0 -E, 


Let us represent the inhomogeneous coordinates 
as z=(E,,w,u),w € Mat(k),u € Mat(k,r). Then the 
domain X; is defined by the condition 


1 
7 (w—w") — uu" 0 


This is an example of Siegel domains of the second 
kind (Pyatetskii-Shapiro 1969). This domain has a 
transitive group of affine transformations: 


(w,u)+>(w+a+2ub* + bb* ,u-+ b) 
a € Herm(k), b € Mat(k,r) 
(w,u)++(cwe*,cu) c € GL(k;C) 


This class of symmetric domains in Grassman- 
nians is called Cartan's domains of the first class. 
There are similar constructions for minimal flag 
domains (compact Hermitian symmetric spaces) 
with other groups. Let us consider the Lagrangian 
Grassmannian Grr(k;2k) corresponding to the 
form J above. Here Gc —Sp(R, C). Its real form 
G=Sp(k;R) can be realized as the subgroup 
of complex symplectic matrices preserving a 
Hermitian form H of the signature (k,k). In other 
words, we intersect the domains from the last 
example with the Lagrangian Grassmannians. We 
consider the coordinate chart with inhomogeneous 
coordinates — symmetric matrices z € Sym(k). For 
H, we have the domain of symmetric matrices z 
with the condition 


E, —z2>>0,z=2' 


This bounded realization is called Siegel's disk. For 
Hə the real form is the group of real symplectic 
matrices and X; is the domain 


T 


1 
Sz—-—(z—z)»0, z=z 


21 
of complex symmetric matrices with positive ima- 
ginary parts; it is called Siegel's half-plane. This is 
the third class of Cartan's domains. There are Siegel 
domains of second kind connecting with the cones 
of positive symmetric matrices; some of them are 
homogeneous, but they are never symmetric. 

There are two more series of classical minimal flag 
manifolds: the isotropic Grassmannians and quadrics. 
They both contain the dual bounded symmetric 
domains (Cartan's domains of second and fourth 
classes correspondingly). Some of these domains in 
the isotropic Grassmannians admit the realizations as 
tubes with the cone of positive Hermitian quaternionic 
matrices and others as Siegel domains of the second 
kind corresponding to the same cones. 

Symmetric domains in quadrics can be realized as 
tube domains with the Lorentzian (light) cones. 
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The corresponding tubes are called the future (past) 
tube, depending on which light cone was taken. 
Let us consider this construction. The group of 
holomorphic automorphisms of these domains is 
G=SO(2;n) — the conformal extension of the 
Lorentz group. To realize this group, let us fix a 
real symmetric matrix O of signature (2,7) and the 
group is the group of linear transformations preser- 
ving simultaneously the quadratic symmetric and 
Hermitian forms with this matrix O: 


g'Qg=Q,  £'Og-O 
The standard realization corresponds to the diagonal 
matrix O with the diagonal (1,1,—1,...,— 1). 
Cartan's domains of the fourth class are connected 
components of the manifold 


ZQZ'-0, - ZOZ*»0 


where rows Z are homogeneous coordinates in the 
projective space CP”*'. In other words, we consider 
a domain on the quadric in the projective space 
(which is the complex flat conformal space CC"). 
For the standard O the domain will lie in the 
coordinate chart; thus it is the bounded realization. 
For the tube realization, we take 


D d 
O=|1 0 0 
0 0 E, 


Let Z-—(20,21,101,...,105), 10 —u F v, q(s,t) 2 s1t1— 
Szi) —+++— ++ —S,t, and we consider the affine 
chart C”*! — (z; — 1). We have 


ZQZ! =2z1 + q(w,w) =0 
ZOZ* —29z + q(w,w)>0 


The first condition gives 2Rz1 = q(v,v) — q(u,u) and 
then the second condition gives the final description 
of the considered set in C7: 


q(v,v) - vi -vj — -—v; 70, 


as the union of the future and the past tubes 
(Ts = {v;20}). The edge R” of these tubes (v— 0) 
has the structure of the Minkowski space correspond- 
ing to the form q. The parabolic subgroup is the affine 
conformal group of the Minkowski space. It includes 
the Poincaré group and is transitive on tubes. The 
complete group of holomorphic automorphisms of 
tubes G — SO(2, n) is the group of all (not only affine) 
conformal transformations of the Minkowski space. 
The complete edge of these symmetric domains in the 
quadric CC" is the conformal compactification of the 
Minkowski space (a compact symmetric R-space with 
the compact group S(O(2) x O(n)) on which the 
noncompact group SO(2, n) also acts). 


w=u+iv 
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In addition to four Cartan's classes of classical 
domains there are two exceptional symmetric 
domains in the dimensions 27 and 16 (dual to two 
exceptional compact Hermitian symmetric spaces of 
these dimensions). The first of them can be realized 
as the tube domain corresponding to the exceptional 
cone of positive Hermitian matrices with Cayley 
numbers of order 3 (the dimension 27) and another 
can be realized as a Siegel domain of the second 
kind associated with the eight-dimensional future 
tube. It is possible, using 工 -function of self-dual 
homogeneous cones, to write explicit Bergman and 
Cauchy-Szego integral formulas. 


Noncompact Symmetric R-Spaces 


There are several other interesting noncompact 
symmetric manifolds. Let us mention the noncom- 
pact symmetric R-spaces which are real forms of 
complex symmetric domains. The typical example is 
the domain of real square matrices x € Mat(k): 


E, —xx' 0 


The condition is that this symmetric matrix is 
positive. It is the Riemann symmetric space with 
the group G — SO(R, k). It can be embedded in the 
real Grassmannian Grr(k;2k) with the matrix 
homogeneous coordinates X € Matg(k,2k) and the 
group SL(2&; R) acting of X by right multiplications. 
Let 


and SO(k,k) be the subgroup of matrices preserving 
the quadratic form I;:glig! =l. This group will 
preserve the domain XI, X > 0 and, in the inho- 
mogeneous coordinates X — (Ej, x),x € Matp(k), it 
will be exactly the same as the domain above. The 
group SO(k,k) acts by matrix fractional linear 
transformations. This domain is the real form on 
Siegel's ball. If we replace the form on 


pof9 4 
SUME 8 


then we realize our symmetric manifold as the 
domain 


xx 0 


So, the symmetric part of the matrix x must be 
positive. This realization is homogeneous relative 
to the linear automorphisms: x= axa! + b, a € 
GL(k; R), b= —b'. A similar construction exists 
for rectangular matrices. 


Geometry of Isomorphisms in Small Dimensions 


We connected above several local isomorphisms of 
complex classical groups with some geometrical 
facts. Let us mention now several similar examples 
for real groups. We start from isomorphisms of 
symmetric cones. The cone Sym, (2) of symmetric 
positive matrices of second order is (linearly) 
isomorphic to the future light cone L(2). The 
comparison of the groups of automorphisms gives 
the local isomorphism 


SL(2; R) &SO(1;2) 


This isomorphism corresponds also to the isomorph- 
ism of two classical realizations of hyperbolic plane - 
of Poincaré and- Klein. Let us also mention that the 
isomorphism SL(2, R) = SU(1, 1) corresponds to the 
holomorphic equivalency of the disk and the upper 
half-plane. The isomorphism Herm, (2) — L(3) corres- 
ponds to the presentation of any Hermitian matrix of 
the order 2 in Pauli's coordinates, 


( t= X1 
z= 

X? — 1x3 
Then, det z=t? — x? — x3 — x3. To compare the 


groups of automorphisms, we receive 
SL(2, C) = SO(1, 3) 


Similarly, in the quaternionic case, the isomorphism 
of the cones Herm, (2, H) gives the isomorphism 


SL(2, H) = SO(1, 5) 


The linear isomorphism of cones produces the 
holomorphic isomorphism of corresponding tubes 
and their groups of holomorphic automorphisms. So 
each of these three isomorphisms gives automati- 
cally one more isomorphism. Let us give it for the 
first two cones: 


Sp(2, R) = 5O(2, 2), 


x2 + 1x3 
E3431 


SU(2, 2) = SO(2,3) 


We just compared the descriptions of automorph- 
isms of classical tubes from above. 

Considering det(x) as the quadratic form of 
signature (2, 2) on Mat(2) — R^, we obtain 


SO(2, 2) = SL(2, R) x SL(2, R) 


Each of local isomorphisms in the complex case 
has different real forms which admit some geome- 
trical interpretations. We mentioned above two real 
forms of the isomorphism SL(4; C) & SO(6; C). The 
isomorphism for SO(2, 2) admits another interpreta- 
tion in the language of Plücker's coordinates: points 
of the quadric in RP? of the signature (2,3) can be 
interpreted as (complex) lines in CP? which lie on a 
Hermitian quadric of the signature (2,2) (Gindikin 


1983). The isomorphism above for the group 
SL(2, H) also corresponds to Hopf’s fibering of 
CP? on complex lines over the sphere S* or the 
isomorphism $* and the quaternionic projective line 
HP!. In all these cases, isomorphisms of homo- 
geneous manifolds intertwine the actions of locally 
isomorphic groups. 


Pseudo-Riemann Symmetric Manifolds 


We obtain the next broad class of homogeneous 
manifolds if we preserve conditions that the group G 
is a real semisimple one, the isotropy subgroup H is 
involutive, but we remove the restriction that H 
must be (maximal) compact. Such symmetric mani- 
folds are often called semisimple pseudo-Riemann 
symmetric manifolds (since there are also pseudo- 
Riemann symmetric manifolds whose groups are not 
semisimple). This class of spaces contains symmetric 
Stein manifolds Xc— Gc/Hc. Each semisimple 
symmetric manifold X — G/H admits complexifica- 
tion as a symmetric Stein manifold. Each real 
semisimple Lie group G is symmetric relative to 
the group G x G. 

The simplest family of semisimple symmetric 
manifolds is the family of all hyperboloids of all 
signatures 


2 E oid 2 
Hyg —dxptcockbxp—Xya-coc-xQ = 1} 


with the groups SO(p,4). Their complexifications 
are complex hyperboloids. There are two types 
of Riemann manifolds in these families: compact 
ones — spheres and noncompact ones — two-sheeted 
hyperboloids; all others are pseudo-Riemann. 

The Cartan duality holds for pseudo-Hermitian 
symmetric manifolds: they are domains in compact 
Hermitian symmetric manifolds (minimal flag mani- 
folds) Z— Gc/Pc. They are open orbits of real 
forms G of the groups of holomorphic automorph- 
isms Gc. We construct examples of such manifolds 
if we consider one of the above-described realiza- 
tions of noncompact Hermitian symmetric mani- 
folds (through matrix homogeneous coordinates) 
and replace the condition of positivity with the 
condition that the symmetric (Hermitian) matrix in 
the definition has a fixed nondegenerate signature 
(i,k — i). We can call such pseudo-Hermitian sym- 
metric manifolds satellites of Hermitian ones. 
Correspondingly, we can consider nonconvex 
tubes, for example, the set T of such symmetric 
matrices whose imaginary parts have the signature 
(i n — i). This domain is linear homogeneous, but it 
is not symmetric; to receive the symmetric manifold 
we need to extend the nonconvex tube by a 
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manifold of smaller dimension (which plays a role 
of infinity). 

There are pseudo-Hermitian symmetric manifolds 
which are not satellites of Hermitian ones. Let us 
give an interesting example. The group SL(2p, R) 
has two open orbits on the  Grassmannian 
Grc(p;2p) which are both pseudo-Hermitian sym- 
metric spaces. Let us consider as above the Stiefel 
coordinates Z € Matc(p,2p) and let Z=X+1Y. 
Then the orbits are defined by the conditions 


det( Y ) 20 


In the intersection with the coordinate chart 
Z=(E,z),z € Matc(p),z=x+iy, we have the 
conditions 


det y20 


Therefore, we obtain (nonconvex) tube domains in 
C = Matc(p), N = p?, corresponding to nonconvex 
homogeneous cones V. of real matrices with 
positive (negative) determinants. These tubes do 
not coincide with the symmetric manifolds which 
include also some sets of small dimensions outside of 
the coordinate chart (on “infinity”). There are other 
homogeneous nonconvex cones such that corre- 
sponding tube domains are Zariski open parts of 
pseudo-Hermitian symmetric spaces (D'Atri and 
Gindikin 1993). Between these cones are cones of 
nondegenerate skew-symmetric matrices, of skew- 
Hermitian quaternionic matrices. We again observe 
strong connections with classical mathematics. Not 
all pseudo-Hermitian symmetric manifolds admit 
such tube realizations of dense parts. Analysis in 
pseudo-Hermitian symmetric manifolds is very 
interesting: we consider there instead of holo- 
morphic functions 0-cohomology of some degree. 

Geometric relations between different symmetric 
manifolds are usually important for analytic applica- 
tions since they can produce some nontrivial integral 
transformations. In a broad sense, such transforms are 
considered in integral geometry (Gelfand et al. 2003). 
An important example is duality between some 
compact Hermitian symmetric manifolds (when points 
in one of them are interpreted as submanifolds in 
another one). The simplest example is the projective 
duality between dual copies of projective spaces or, 
more generally, the realization of points of Grass- 
mannians as projective planes. Such a duality can 
induce a duality between orbits of real forms of groups. 
In a special case, it can be a duality between Hermitian 
and pseudo-Hermitian symmetric manifolds. 

Here is one important example. Let us consider in 
the projective space CP**~! the domain D which in 
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homogeneous coordinates — rows Z= (z9,21,...,2;) 
are defined by the equation zHz* » 0, where H 
is a Hermitian form of the signature (k,k), for 
example, 


2 2 2 2 
| 大 六 十 十 区 一 你 下 一 一 | 加 六 > 


This domain is (k — 1)-pseudoconcave and it con- 
tains (k — 1)-dimensional complex compact cycles, 
namely (k — 1)-dimensional planes. The manifold of 
these planes is exactly the domain X in the Grass- 
mannian Grc(k;2k) (of projective (k — 1)-planes) 
which is the noncompact Hermitian symmetric 
space — the orbit of the group SU(k,k) (see above). 
This picture is the geometrical basis for a deep 
analytic construction. In the domain D the spaces 
of (k — 1)-dimensional -cohomology are infinite 
dimensional for some coefficients. Their integration 
on (k—1)-planes (the Penrose transform) gives 
sections of corresponding vector bundles on X. The 
images are described by differential equations — 
generalized massless equations. The basic twistor 
theory corresponds to & —2 when X is isomorphic 
to four-dimensional future tube (see above). 

Similar dual realizations of Hermitian symmetric 
manifolds exist only in special cases. The twistor 
realization of four-dimensional future tube was 
possible since the Grassmannian Grc(2;4) is iso- 
morphic to the quadric in CP?. This does not work 
for the future tubes of bigger dimensions but there is 
another possibility (Gindikin 1998). Let us have the 
quadric Q„—ı C CP" be defined in the homogeneous 
coordinates by the equation 


T(z) = (29) 一 (21 y = (Zn) =0 


and z -Ç is the bilinear form. As already mentioned, 
the set of (nondegenerate) hyperplane sections 


CE oo. [3(C) 21 


of O,. 1 is the corresponding hyperboloid H,. Thus, 
we have the duality between a flag manifold (the 
quadric Q,, 1) and a symmetric Stein manifold (the 
hyperboloid H,) with the same group SO(n + 1, C); 
they have different dimensions. 

The group SO(1,”) has two orbits on Q,. 1: 
the real quadric Og = {z € Q, 1; S(z) 2 0) and its 
complement X— O, 1VOn. Hyperplane sections 
which do not intersect Op (lie at X) correspond 
such ¢ € H, that 


¢:z=0, 


[L1(R(z)) > 0 


This set has two connected components D+ which 
are biholomorphically equivalent to the future and 
past tubes T+ of the dimension n. Let us emphasize 
that their group of automorphisms is SO(2,7) in 


spite of the fact that this group acts neither on X 
nor on H,. Such an extension of the symmetry 
group is a very interesting phenomenon. It happens 
for several other symmetric manifolds, but is not a 
general fact. This geometrical construction gives a 
possibility to construct a multidimensional version 
of the Penrose transform from (n — 2)-dimensional 
-cohomology with different coefficients into solu- 
tions of massless equations on the future (past) 
tubes. 

The last duality is connected with some general 
geometrical construction. We mentioned that each of 
the Riemann symmetric manifolds X — G/K admits a 
canonical embedding in the symmetric Stein manifold 
Xc = Ge/Ke. It turns out that X has in Xc a canonical 
Stein neighborhood — the complex crown Q(X) such 
that many analytic objects on X can be holomorphi- 
cally extended on the crown (Gindikin 2002). For 
example, all solutions of all invariant differential 
equations on X (which are elliptic) admit such 
holomorphic extension. In the last example, D, is 
the crown of the Riemann symmetric space which is 
defined, in H,, by the condition S(C) = 0, R(Co) > 0. 

Symmetric manifolds are distinguished from most 
other homogeneous manifolds by a very rich 
geometry which is a background for deep analytic 
considerations. There are several important nonsym- 
metric homogeneous manifolds. We already men- 
tioned flag manifolds and Stein homogeneous 
manifolds with complex semisimple Lie groups 
which can be nonsymmetric. Pseudo-Riemann sym- 
metric manifolds are open orbits of real groups on 
compact Hermitian symmetric spaces. It turns out 
that open orbits on other flag manifolds also 
produce interesting homogeneous manifolds. Let 
F— Gc/Pc be a flag manifold. Flag domains are 
open orbits of a real form G on F. Of course, 
pseudo-Hermitian symmetric manifolds are a special 
case of this construction. Let us consider a simple 
example with Gc —SL(3; C) and P - the triangle 
group. Then points of F are pairs [a point z and a 
line / passing through it}. Let G —SU(2; 1); it has 
two open orbits on CP?: the complex ball D and its 
complementary D*. On F, the group G has three 
open orbits (flag domains): in the first z € D, I is 
arbitrary; in the second / C DS; in the third z € DE, 1 
intersects D. They are all 1-pseudoconcave. In one- 
dimensional -cohomology of these flag domains 
with coefficients in line bundles, are realized all 
three discrete series of unitary representations of 
SU(2, 1). For arbitrary semisimple Lie groups, all 
discrete series of representations can also be realized 
in -cohomology of flag domains. Crowns of 
Riemann symmetric spaces which we just mentioned 
parametrize cycles (complex compact submanifolds) 
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in flag domains. Some general version of the Penrose 
transform connects through the integration along 
cycles cohomology in flag domains with holo- 
morphic solutions of some differential equations in 
crowns (generalized massless equations). 


See also: Combinatorics: Overview; Compact Groups 
and their Representations; Lie Groups: General Theory; 
Pseudo-Riemannian Nilpotent Lie Groups; Several 
Complex Variables: Compact Manifolds; Stability of 
Minkowski Space; Symmetry Classes in Random Matrix 
Theory; Twistor Theory: Some Applications; Twistors. 
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Introduction 


The notion of “classical r-matrices” has emerged as 
a by-product of the quantum inverse scattering 
method (which was developed mainly by L D 
Faddeev and his team in their work at the Steklov 
Mathematical Institute in Leningrad); it has given a 
new insight into the study of Hamiltonian structures 
associated with classical integrable systems solvable 
by the classical inverse scattering method and its 
generalizations. Important classification results for 
classical r-matrices are due to Belavin and Drinfeld. 
Based on the initial results of Sklyanin, Drinfeld 
introduced the important concepts of “Poisson Lie 
groups" and “Lie bialgebras" which arise as a 
semiclassical approximation in the study of quan- 
tum groups. 

A Poisson group is a Lie group G equipped 
with a Poisson bracket such that the multiplica- 
tion m:G xG — G is a Poisson mapping. A 
Poisson bracket on G with this property is called 
multiplicative. More explicitly, let A4, 0, be the 
left and right translation operators in C*(G) by 


an element x € G, Axe(y) — (xy), prp(y)= (yx). 


Multiplication in G is a Poisson mapping, if for 
any p, Y € C*(G), we have 


Lov (xy) = Dv Arb} (y) + {pyp, oyv)(x) [1] 


Note that in general, multiplicative brackets are 
neither left nor right invariant; in other words, for 
fixed x translation operators Àx, px do not preserve 
Poisson brackets. 

Multiplicative Poisson brackets naturally arise in the 
study of integrable systems which admit the so-called 
*zero-curvature representation." The study of zero- 
curvature equations, and in particular, of the Poisson 
properties of the associated monodromy map, was the 
main source of nontrivial examples (associated with 
classical r-matrices, classical Yang-Baxter equations, 
and factorizable Lie bialgebras). The special class of 
multiplicative Poisson brackets encountered in this 
context is closely related to factorization problems in 
Lie groups (in particular, the matrix Riemann pro- 
blem); these problems represent the key tools in 
constructing solutions of zero-curvature equations. 

The equivalent definition of Poisson Lie groups 
uses the dual language of *Hopf algebras." Let 
A — F(G) be the commutative algebra of (smooth) 
functions on a Lie group G equipped with the 
standard coproduct A: A —A & A 


Ao(x, y) = p(xy), p € F(G), 


as usual, we identify the (topological) tensor product 
F(G)& F(G) with F(G x G). The multiplicative 


x,yEG 


512 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 


Poisson bracket on G equips F(G) with the structure 
of a Poisson-Hopf algebra, that is 


Alo, v) = {Ay, Av] [2] 


Equation [2] is the starting point for the study of 
relations between Poisson groups and quantum 
groups. Following the general philosophy of defor- 
mation quantization, we can look for a deformation 
A, of the commutative Hopf algebra A with the 
deformation germ determined by the Poisson struc- 
ture on A satisfying eqn [2]. The fundamental 
theorem (conjectured by Drinfeld and proved by 
Etingof and Kazhdan) asserts that any Poisson 
algebra associated with a Poisson group admits a 
formal quantization (in the category of Hopf 
algebras). 


Poisson Groups and Lie Bialgebras 


Let G be a Lie group with Lie algebra q equipped 
with a multiplicative Poisson bracket. Any Poisson 
bracket is bilinear in differentials of functions; it is 
convenient to express it by means of right- or left- 
invariant differentials. For p € F(G) set 


(V(x), X) = (d/dt),_9p(ex), 
(V' p(x), X) = (d/dt), oe (xe'*), 
Xe€g, Ve(x), V'e(x) € a' 


Let us define the Poisson operator 7:G 一 
Hom(q*, a) by setting 
lo, v3 (x) = (n(x) Ve(x), Vv) i3] 


For a finite-dimensional Lie algebra, we can identify 
Hom(q*,q) with q@q; the skew symmetry of 
Poisson bracket implies that » € q A a. By an abuse 
of language, the same identification is traditionally 
used for infinite-dimensional algebras (e.g., for loop 
algebras) as well. Of course, in the latter case, the 
corresponding Poisson tensors are represented by 
singular kernels which do not lie in the algebraic 
tensor product and should be regarded as 
distributions. : 

Multiplicativity of Poisson bracket on G implies a 
functional equation for 7 


n(xy) = (Ad x & Ad x) - n(y) + n(x) [4] 


which means that 7 is a 1-cocycle on G (with values 
in 9 Aq). By setting 


&(X) = (2) mem, Xen 


we conclude from eqn [4] that ó6:q— q^& is a 
1-cocycle on q, that is, 


6([X, Y]) = [X 8 I-- 1 & X, 6(Y)] 
- [Y e 14-1 Y,6(X)] 


Equation [4] implies that (e) — 0, that is, a multi- 
plicative Poisson structure is identically zero at the 
unit element. Its linearization at this point induces 
the structure of a Lie algebra on the cotangent space 
T7 G ~ q*; namely, for any £,£' € q*, choose PP € 
F(G) in such a way that Vep =€, V-y’ —£', and set 


[6,6], = Veto. v) i5] 


It is easy to see that ([6,2],, X) 2 (EAE, 6(X)), 
which proves-that the bracket is well defined, 
while eqn [5] implies the Jacobi identity. 


Definition 1 Let q,q* be a pair of linear spaces set 
in duality; (q,q*) is called a Lie bialgebra if both à 
and q* are Lie algebras and the mapping ó:aq— 4 & 
q which is dual to the commutator map [,],:q" & 
q*— q* is a 1-cocycle on q. 


Thus if G is a Poisson-Lie group, the pair (a, q*) is 
a Lie bialgebra (called the *tangent Lie bialgebra" of 
G). Poisson-Lie groups form a category in which the 
morphisms are Lie group homomorphisms, which 
are also Poisson mappings. A morphism 
(a, q°)~> (6, 6") in the category of Lie bialgebras is 
a Lie algebra homomorphism q 一 such that the 
dual map bř — qa* is again a Lie algebra homo- 
morphism. It is easy to see that morphisms of 
Poisson groups induce morphisms of their tangent 
bialgebras. The converse is also true. 


Theorem 1 


(i) Let (q,q*) be a Lie bialgebra, G a connected, 
simply connected Lie group witb Lie algebra q. 
Tbere is a unique multiplicative Poisson bracket 
on G such that (g, a*) is its tangent Lie bialgebra. 

(ii) Morpbisms of Lie bialgebras induce Poisson 
mappings of tbe corresponding Poisson groups. 


Basically, the theorem asserts that a Poisson 
tensor is uniquely restored from the infinitesimal 
cocycle on the corresponding Lie algebra; moreover, 
the obstruction for the Jacobi identity vanishes 
globally if this is true for its infinitesimal part at 
the unit element of the group. 

It is important to observe that Lie bialgebras 
possess a remarkable symmetry: if (q,q*) is a Lie 
bialgebra, the same is true for (q*,q). Hence, the 
dual group G* (which corresponds to q*) also carries 
a multiplicative Poisson bracket. The duality theory 
for Lie bialgebras, based on the key notion of the 
Drinfeld double, is discussed in the next section. 
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Classical r-Matrices and Special 
Classes of Lie Bialgebras 


The general classification problem for Lie bialgebras 
is unfeasible (e.g., classification of abelian Lie 
bialgebras includes classification of all Lie algebras). 
In applications, one mainly deals with important 
special classes of Lie bialgebras, of which factoriz- 
able Lie bialgebras are probably the most important. 
In a sense, this class may be regarded as exhaustive, 
since (as explained below) any Lie bialgebra is 
canonically embedded into a factorizable one. 
Various other special classes discussed in literature 
are “coboundary bialgebras," "triangular bialge- 
bras," and “quasitriangular bialgebras." 

The Lie bialgebra (q,q*, 5) is called a coboundary 
bialgebra if the cobracket 6 is a trivial 1-cocycle on qd, 
that is, 


6(X)=[X @I1+I@xX,r| forallXea [6] 


the constant element r € q ^ q is called the “classical 
r-matrix.” If g is semisimple, H'(q,V)=0 for any 
a-module V by the classical Whitehead theorem, and 
hence all Lie bialgebra structures on Q are of 
coboundary type. The associated Lie bracket on q* 
is given by the formula 


[££], — ad, r£ -£€ — ad; r£ -€ [7] 


where we identified r € q ^ g with a skew-symmetric 
linear operator r:q* — Q. The restrictions imposed 
on r by the Jacobi identity are formulated in terms 
of the so-called *Yang-Baxter tensor" [[r,r]] € g A 
q ^ a, which is a quadratic expression in r. To define 
it, let us mark different factors in tensor products, 
for example, 9 & q & Q, by fixed numbers 1, 2, 3,... 
which indicate their place; for simplicity, we assume 
that q is embedded in an associative algebra A with a 
unit. The embeddings are defined as 


i2,53,13:0 9g — ABABA 


by setting ij2(X@Y)=X@Y@I, and similarly 
in other cases. For a€ q@q, we put 712(a) —412, 
etc. Set 


[[r, r]] = [ri2, 713] 十 [ri2, 723] 十 [r13, 723] [8] 


The commutators in the RHS are computed in the 
associative algebra A® AQ A; it is easy to check 
that the result does not depend on the choice of the 
embedding à — A. 


Proposition 1 The Jacobi identity for | , |, is valid if 
and only if |[r, r]] is ad q-invariant, that is, if 


Xelel-I9SXeolI-IGIGX,[[]]] 20 
for all X Eq 


A coboundary Lie bialgebra with [[r, 7]] € (Ag)? 
is called *quasitriangular"; it is called “triangular” 
if r satisfies the classical Yang-Baxter equation 
[[r, r]] 2 0. (Both terms come from another name of 
the classical Yang-Baxter equation, the “classical 
triangle equation.") 

When a Lie algebra q admits a nondegenerate 
invariant inner product, the class of quasitriangular 
Lie bialgebra structures on q admits an important 
specialization. Let q@q*~q@q be the natural 
isomorphism induced by the inner product. Let / € 
q & q* be the canonical element; its image t € q@q 
under this isomorphism is called the “tensor 
Casimir element." Clearly, t € ($?g)* and, more- 
over, [£12,123] € (^q)*. When q is semisimple, the 
mapping (S^q)9 —^ (Ag)! :s [si2,s23] is an iso- 
morphism; in particular, if g is simple, both spaces 
are one dimensional and generated by a tensor 
Casimir (which is unique up to a scalar multiple). A 
Lie bialgebra (q,r) is called factorizable if r € a ^q 


satisfies the modified classical Yang—Baxter 
equation 

[r, 7] = c[ti2,t23], c= const 40 [9] 
The convenient normalization is c — —1/4 (it can be 


achieved by an appropriate normalization of r). 
Instead of dealing with the modified Yang-Baxter 
equation, we may relax the antisymmetry condition 
imposed on r. Set rz =r+(1/2)t€ q@q. Since t 
is ad q-invariant, the symmetric part of r+ drops 
out from the cobracket; on the other hand, one 
has [[r+,r+]]=0. Regarding r+ as a linear operator, 
ri € Hom(q*,q), we get the following important 
result: 


Proposition 2 Let (q,q*) be a factorizable Lie 
bialgebra. 


(i) Tbe mappings rz € Hom(q*,q) are Lie algebra 
homomorphisms; moreover, 六 = —r... 
(ii) The combined mapping 


i : g° ~qgeq: Xr (r,X,r_X) 


is a Lie algebra embedding. 
(iii) Any X€q admits a unique decomposition 
X =X, — X_ with (X,,X ) € Imi. 


The additive decomposition in a factorizable Lie 
bialgebra gives rise to a multiplicative factorization 
problem in the associated Lie group. Namely, i, may 
be extended to a Lie group embedding i, : G* — G x 
G and any x € G, which is sufficiently close to the 
unit element, admits a decomposition x —x,x^ 
with (x,,x_) € Im 1,. 

Any Lie bialgebra (g,q*) admits a canonical 
embedding into a larger Lie bialgebra (called its 
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*double") which is already factorizable. Namely, set 
D—aqGqg' as a linear space and equip it with the 
natural inner product, 


(((X, F), (X',F))) = (X) +(F,X) [10] 


Theorem 2 


(1) There exists a unique structure of the Lie algebra 
on b such that: (a) a, a* C b are Lie subalgebras. 
(b) The inner product |10] is invariant. 

(ii) Let Pa, Pu， be the projection operators onto 
q,a'c p parallel to the complementary sub- 
algebra. Set p" =P, = =P; then (b,rh) isa 
factorizable Lie bialgebra. 

(iii) The inclusion map (q, g*)~—(d,d*) is a homo- 
morphism of Lie bialgebras and the dual inclusion 
map (q,q) ^» (b, D') is an antibomomorphism. 


Conversely, let a be a Lie algebra equipped with a 
nondegenerate invariant inner product, a. C a its Lie 
subalgebras such that (7) aa are isotropic with respect 
to inner product, (ii) a —a,. +a- as a linear space. 
The triple (a,a,,a_) is called a “Manin triple.” Let 
P. be the projection operators onto a, in this 
decomposition. Set r.— +P. Then (a,r.) is a 
factorizable Lie bialgebra; moreover, a, and a. are 
set into duality by the inner product in a and inherit 
the structure of a Lie bialgebra, and a is their double. 

If (q,q*) is itself a factorizable Lie bialgebra, its 
double admits a simple explicit description. Set 
D=q@q (direct sum of Lie algebras); let us equip 
D with the inner product 


(X, X^), (Y, Y») = (X, Y) E CY, Y) 


Let à? C b be the diagonal subalgebra; we identify 
q* with the embedded subalgebra i,(q*) C b. 


Proposition 3 


(i) (5,05, ,(a*)) is a Manin triple. 
(ii) As a Lie algebra, b —àq ® q is isomorphic to the 
double of q. 


Key examples of factorizable Lie bialgebras are 
associated with semisimple Lie algebras and their 
loop algebras. 


1. Let f be a compact semisimple Lie algebra: q = fc 
its complexification regarded as a real Lie algebra, 
c € Aut q the Cartan involution which fixes f, and 
q=f@p the associated Cartan decomposition. 
Fix a real split Cartan subalgebra a C p and the 
associated Iwasawa decomposition q=f +a +n; 
put =Q +n. Let B be the complex Killing form 
on Q; let us equip à with the real inner product 
(X,Y)—-ImB(X,Y), then (a,f,5) is a Manin 


triple. Hence, any compact semisimple Lie group 
K carries a natural Poisson structure; its double 
G — D(K) is the complex group G — Kc (regarded 
as a real Lie group). The associated factorization 
problem in G is the Iwasawa decomposition 
G =KAN, which exists globally. 


. Let q be a real split semisimple Lie algebra, b its 


Cartan subalgebra, and A, a system of positive 
roots. Fix an invariant inner product on à which 
is positive on D, and let {e,;a € +A,} be the root 
vectors normalized in such a way that 
(Bea) = 1. Let 


N+ = C R “EF 


aE, 


Fix an orthonormal basis {H;} in 5; let P4, Po 
be the projection operators onto n4, ĵĤ in the 
Bruhat decomposition q—1 .--D.--n,. The 
standard Lie bialgebra structure on q is given 
by the r-matrices r.— cP.-jPe. In tensor 
notation, 


fie Yea heats HoH [11] 


a€A, 


Let b, — f; +n. be the opposite Borel subalge- 
bras; the inner product in q sets them into 
duality, and (b,,b ) is a Lie sub-bialgebra 
in (q,q*). Let G be the connected, simply 
connected Lie group associated with q, B. — 
HN. its opposite Borel subgroups which corres- 
pond to b4. Let p: B. —»^B./N& œH be the 
canonical projection. The associated factoriza- 
tion problem in G, a—b,b^, (b,,b )e By x 
B ,p(b,)—p(b )!, is closely related to the 
Bruhat decomposition; it is solvable for all g in 
the open Bruhat cell B, N_ C G. 


. Let La= q ® C((z)) be the loop algebra of a finite 


dimensional semisimple Lie algebra g, as usual we 
denote the ring of formal Laurent series by C((z)). 
Put La, = 0 ® C[[z], Lo- =0. 8 z C[z]. Fix an 
invariant inner product on q and equip La with 
the inner product 


(X, Y)) = Res;-o(X(z), Y(z)) dz 


Then (La, La, , La. .) isa Manin triple. The associa- 
ted classical r-matrix is called “rational r-matrix”; in 
tensor notation, it is represented by a singular kernel 


t 
r(z,z) = 一 一 一 
al ges 
where t € a $ q is the tensor Casimir, which is 
essentially the Cauchy kernel. 


. Let us assume that g = 8[(n); in this case, the loop 


algebra Là admits a nontrivial decomposition 
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associated with the so-called “elliptic r-matrix." 
Set 


I, = diag(1,e,...,&"- !) 


, 


0 1 us» M 
0 1] 
4, 2 
n mes , E- e2mi/n 
1 
b 0 


Put Z2—Z/nZ x Z[nZ; for a=(a1,a2) € Z2, 
set 1; — I7 17; matrices I, define an irreducible 
projective representation of Z2 (they form the so- 
called *finite Heisenberg group)". Let us denote 
the elliptic curve of modulus 7 by E= C/Z + TZ 
and let P — E be the n-dimensional holomorphic 
vector bundle with flat connection and with 
monodromies given by 


z2z41:h42Adl, zeoz-T:b;—AdLb 


Let Ge C La be the subspace of Laurent expansions 
at zero of the global meromorphic sections of P 
with a unique pole at 0 € E. Then (La, L+, Ge) is 
again a Manin triple. The associated classical 
r-matrix is the kernel of a singular integral operator 
which associates a meromorphic section of P to its 
principal part at 0. Explicitly, it is given by 


| 1 2! z 一 2 
(«-2)- 23, -a-br) 13 


a,b=0 
x (AdI,4 @1)+t 


where C is the Weierstrass zeta function. 

5. Let q be an arbitrary semisimple Lie algebra 
again. Let us equip the loop algebra Lq with the 
inner product 


((X, Y))y = Res,-o(X(z), Y(z))z ! dz 


St N+ =n; +g ® 2C[z]], Ni:=n- +g® 
z!C[z!] We have La=N, 4-5 à-N., where 
we identify D,n. C q with the corresponding 
subalgebras of constant loops in La. Let P+, Po 
be the projection operators onto M+, in this 
decomposition and r= +P,+(1/2)Po. The 
classical r-matrices r+ define on La the structure 
of a factorizable Lie bialgebra. The associated 
tensor kernels are called the trigonometric classi- 
cal r-matrices. 


Classical r-matrices described above are associated 
with factorization problems in the infinite-dimensional 
loop groups: matrix Riemann problems or matrix 
Cousin problems (in the elliptic case). Belavin and 


Drinfeld have given a complete classification of 
factorizable Lie bialgebra structures for semisimple 
Lie algebras; in the loop algebra case, the problem they 
solved consists of classification of all meromorphic 
solutions of the classical Yang-Baxter equation. In 
other words, we assume that the distribution kernel 
associated with the classical r-matrix is represented by 
a meromorphic function (of two complex variables). 
Up to an equivalence, any such solution depends 
only on one variable and belongs to the rational, 
trigonometric, or elliptic type (in the latter case, the 
underlying Lie algebra is necessarily $l(n)). Classifi- 
cation of solutions in the elliptic case is completely 
rigid; in the trigonometric case, the moduli space is 
finite dimensional and admits an explicit descrip- 
tion. In the rational case, the classification is 
somewhat less explicit (it has been completed by A 
Stolin under some nondegeneracy condition). Con- 
trary to to the popular belief, there are many other 
structures of a factorizable Lie bialgebra on loop 
algebras, for which the associated r-matrices are 
given by more singular distribution kernels. 


Poisson Lie Groups 


If the tangent Lie bialgebra of a Poisson Lie group is 
of coboundary type, the cocycle 7 is also trivial, 


nia) =r—Adq@®Adq-r. Hence, the Poisson 
bracket on G is given by 
(o, v) = (r, Vig A V) —(r VoA V), rega^g 


where Vy, V'ọ € q” are left and right differentials of 
y € C*(G). This is the so-called *Sklyanin bracket". 
Let us assume that G is a matrix group; its affine 
ring generated by evaluation functions $j which 
assign to L € G its matrix coefficients, ó;(L) = Lj. 
The Poisson bracket on G is completely determined 
by its values on $;. Explicitly, we get 


{ dij, dem }(L) 一 Ir, Le Lio, [14] 


the commutator in the RHS is in Mat(z?). By a 
variation of language, evaluating functions and their 
values on a generic element L € G are denoted by 
the same letter; using tensor notation to suppress 


matrix indices, we get 
(Li, L2] = [r, L1L2], La =L@ Il, =I QL [15] 


In the case of loop algebras, these Poisson bracket 
relations take the form 


{L1 (A), La(u)} = [r(A, æ), L1(0) Lo (u)] 


Let us assume that G is factorizable and the 
associated factorization problem is globally solvable. 
The Poisson bracket on the dual group G* œ 
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i(G*) C G x G may be characterized in terms of the 
matrix coefficients of (b.,b )—i(b), or of their 
quotient h — b, b^!. Explicitly, we get 


(LR). [eA], {PL} [reb] (16 
Un. hz} = thyb2 + hy hor — bor, bi — hir bj, 


17 
r-i(r,4r.) 17] 
The key question in the geometry of Poisson 
groups consists in description of symplectic leaves in 
G, G*. This question is already nontrivial when G* is 
abelian (and hence may be identified with the dual of 
the Lie algebra q=Lie(G)). The Poisson bracket on 
q* is linear; this is the well-known Lie-Poisson (alias, 
Beresin-Kirillov-Kostant) bracket. Its symplectic 
leaves coincide with the orbits of the coadjoint 
representation of G in q*. The natural way to prove 
this fundamental result (which goes back to Lie) is to 
consider first the natural action of G on the 
cotangent bundle T*G~2~Gxq*; this action is 
Hamiltonian, and the coadjoint orbits arise as a 
result of Hamiltonian reduction associated with this 
action. The generalization of the theory of coadjoint 
orbits to the case of arbitrary Poisson groups starts 
with the notion of symplectic double, which is the 
nonlinear analog of the cotangent bundle. 

Let D be the double of (G,G*); assume for 
simplicity that D — G- G* globally and hence the 
associated factorization problem is always solvable. 
Let ry — (1/2)(P4 — Pg). Set 


(o, V) = (n Vp, VV) + (r Vp, V'v) [18] 


The bracket {,}_ is the usual Sklyanin bracket which 
defines the structure of a Poisson group on D, while 
{,}, is nondegenerate and defines a symplectic 
structure on D. Let us denote the copies of D equipped 
with the bracket { ,} by D... The bracket on D, is not 
multiplicative, but it is covariant with respect to the 
action of D_ by left and right translations; in other 
words, the natural mappings D. x D, — D, and 
D, x D- — D+, associated with multiplication in D, 
preserve Poisson brackets. Since G,G* CD are 
Poisson subgroups, natural actions G x D, — D, 
and G* x D, — D, by left and right translations are 
Poisson mappings. Consider the natural projections 


D D 


INT NP 


Gt e D/G G\D~G*  GaeDJ/G' Gt\D~G 


onto the space of left and right coset classes. It is easy 
to see that functions on D , which are constant on each 
projection fiber are closed with respect to the Poisson 
bracket. This means that the quotient spaces inherit 


the Poisson structure. Moreover, the maps 7,7 and 
p,p' form the so-called “dual pairs", that is, the 
algebras of functions which are constant on the fibers 
of « and x (or of p and p’) are mutual centralizers of 
one another in the big Poisson algebra F(D,). 
Since D=G-G*=G*-G, we have G*/D~G, 
G/D ~ G*; it is easy to check that the quotient 
Poisson structure induced on G,G* coincides with 
the original one. Applying the fundamental theorem 
on dual pairs of Poisson mappings (going back to S. 
Lie), we conclude that symplectic leaves in G and G*, 
respectively, coincide with the orbits of G* (respec- 
tively, G) in these quotient spaces. The actions G x 
G* + G*,G* x G— G are called “dressing transfor- 
mations". Unit elements in G and G* are fixed points 
of dressing transformations; their linearizations at the 
tangent spaces T,G* œ q*, T,G œ à coincide with the 
coadjoint actions of G and G*, respectively. 

When D Æ G - G* (i.e., the factorization problem in 
D is not always solvable), dressing actions are still well 
defined as global transformations of the quotient 
spaces; in this case G, G* may be identified with open 
cells in D/G*, D/G, respectively, which means that 
dressing action on G, G* is, in general, incomplete. 

If the group G is factorizable, symplectic leaves in the 
dual group G* admit a nice uniform description: since 
in this case D— G x G and G C D is the diagonal 
subgroup, the quotient D/G may be modeled on G 
itself. The quotient Poisson bracket in this realization 
coincides with [17], while the dressing action coin- 
cides with conjugation in G (and is independent of 
r). Hence, symplectic leaves in D/G coincide with 
conjugacy classes in G; the equivalence of this model 
with G* (equipped with the bracket [16]) is provided 
by the factorization map. The description of sym- 
plectic leaves in G is more subtle (and already 
crucially depends on the choice of r!); for semisimple 
Lie groups with the standard Poisson structure, it is 
related to the geometry of double Bruhat cells. 

For loop groups with rational, trigonometric, or 
elliptic r-matrices, dressing action is associated with 
auxiliary factorization problems in the loop group. 
Roughly speaking, symplectic leaves correspond to 
rational loops with prescribed singularities. Many 
important examples have been described in connection 
with integrable lattice systems, although a complete 
classification theorem is still not available. For 
q= $l(2), the elliptic Manin triple described earlier 
leads to the Poisson structure on the group of “elliptic 
loops" with values in SL(2); its simplest symplectic 
leaves (corresponding to loops with simple poles) are 
associated with a remarkable Poisson algebra, the 
Sklyanin algebra (with four generators and two 
Casimir functions), which admits an interesting 
explicit quantization. 
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Dressing action is a nontrivial example of a 
Poisson group action. In general, such actions are 
not Hamiltonian in the usual sense; the appropriate 
generalization is provided by the notion of the 
nonabelian moment map. Let Gx 人 4 一 人 be an 
action of a Poisson group G on a Poisson manifold 
M,a— Vect M, the associated homomorphism of 
Lie algebras. A mapping 44: M-—G* is called the 
nonabelian moment map associated with this action, 
if for any X € g and y € F(M), we have 


X p= (u {jp} X) 


In this case, G x M —.M is a fortiori a Poisson 
map. Both dressing actions G* x G— G and G x 
G* — G* admit nonabelian moment maps, which are 
just the identity maps jj— idc and j*-idc.. For 
compact Poisson groups, the nonabelian moment 
map has good convexity properties, which general- 
ize the convexity properties of the ordinary moment 
map for Hamiltonian group actions. 

The general theory of homogeneous Poisson spaces 
has some peculiarities. Typically, the G-covariant 
Poisson structure on a given homogeneous space is 
not unique (when it exists); this is true already for 
principal homogeneous spaces (a simple example is 
provided by the symplectic double D,). Let G be a 
Poisson Lie group, (Q, q*) its tangent Lie bialgebra, Ð 
its double, U its Lie subgroup, u = Lie U. A subalgebra 
{ C Dis called Lagrangian if it is isotropic with respect 
to the canonical inner product in Db. The general 
classification result, according to Drinfeld, asserts that 
there is a bijection between G-covariant Poisson 
structures on G/U and the set of all Lagrangian 
subalgebras [ C b such that LO g=u. Various non- 
trivial examples arise, notably in the study of integr- 
able systems. For instance, the geometric proof of the 
factorization theorem for lattice zero-curvature equa- 
tion, which is stated in the following section, uses a 
different Poisson structure on the double (the so-called 
“twisted symplectic double).” 


Applications to Integrable Systems 


The definition of Poisson—Lie groups was motivated 
by key examples which arise in the theory of 
integrable systems. In applications, one often deals 
with nonlinear differential equations which may be 
written in the form of the so-called “lattice zero 
curvature equations” 

dL, 

dé "- LM = Mnt1 Lm, 
where L,,M,, are matrices, possibly depending on 
an additional parameter (or, more generally, abstract 


m € Z, [19] 


linear operators). Equations [19] give the compat- 
ibility conditions for the auxiliary linear system 


dym 
dt 


The use of finite-difference operators associated with 
a one-dimensional lattice, as in [20], is particularly 
well suited for the study of “multiparticle” lattice 
models. Let we assume that the *potential" L,, in [20] 
is periodic, Lyin=Lm; the period N may be 


Umia = Loth, 一 一 人 Mao m € 7, [20] 


‘interpreted as the number of copies of an “elemen- 


tary” system. It is natural to presume that “Lax 
matrices" L,, in [19] are elements of a matrix Lie 
group G (or of a loop group, if they depend on an 
extra parameter). The auxiliary linear problem [20] 
leads to a family of dynamical systems on G^ which 
remain integrable for any N. Let T: GN — G be the 
“monodromy map" which assigns to the set 
Li,....Lw of local Lax matrices their ordered 
product Tj = LNLN_\--- L4. Let us assume that G is 
equipped with the Sklyanin bracket associated with a 
factorizable r-matrix r. Then T is a Poisson map. Let 
I(G) be the algebra of central functions on G; for y € 
I(G), set H; —o T. All functions H,, p € I(G) are 
in involution with respect to the product Poisson 
bracket on GN and give rise to lattice zero-curvature 
equations of the same form as [19]; for a given y, we 
may choose the M-matrix in either of the two forms: 


My, = rem VPT Nn) m= [[ Le 


1<k<m 


Let Ly (t)m=1,...,N, be the integral curve of 
this equation which starts at L2. The construction of 
this curve reduces to the factorization problem asso- 
ciated with the chosen r-matrix. Explicitly, we get 


jon( 8,6) Tog, (Oy, = a Doe (6). 


where (g,,(t),,g,,(t)_) is the curve in G* which 
solves the factorization problem 


gy (£) gs (1). = dm exp(tVye(T(L°))) "yz, 
Wine = Sos (L9) 


This result exhibits the double role of the r-matrix. 
On the one hand, it serves to define the Poisson 
structure on GAN which is adapted to the study of 
lattice zero-curvature equations; in particular, the 
dynamical flow associated with these equations is 
automatically confined to symplectic leaves in GN. 
(In applications, G is usually a loop group equipped 
with a factorizable r-matrix; despite the fact that 
dim G — oc, it admits plenty finite-dimensional sym- 
plectic leaves.) In its second incarnation, the r-matrix 
serves to define the factorization problem which 
solves these zero-curvature equations. In the loop 
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group case, this is a matrix Riemann problem; its 
explicit solution is based on the study of the spectral 
curve associated with the “monodromy matrix" Tr 
and uses the technique of algebraic geometry. 

The monodromy map T : GN — G may be regarded 
as a nonabelian moment map associated with an 
action of the dual Lie algebra q* on the phase space. 
This action actually extends to an action of the (local) 
Lie group G* which transforms solutions into solu- 
tions again. This is the prototype “dressing” action 
(originally defined by Zakharov and Shabat in their 
study of zero-curvature equations related to Riemann- 
Hilbert problems). Dressing provides an effective tool 
to produce new solutions of zero-curvature equations 
from the *trivial" ones; it was also the first nontrivial 
example of a Poisson group action. 


See also: Affine Quantum Groups; Bicrossproduct 

Hopf Algebras and Noncommutative Spacetime; 
Bi-Hamiltonian Methods in Soliton Theory; Deformations 
of the Poisson Bracket on a Symplectic Manifold; 
Functional Equations and Integrable Systems; 
Hamiltonian Fluid Dynamics; Hopf Algebras and 
q-Deformation Quantum Groups; Integrable Systems 
and Recursion Operators on Symplectic and Jacobi 
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Introduction 
Introductory and Historical Remarks 


Clifford (1878) introduced his “geometric algebras” 
as a generalization of Grassmann algebras, complex 
numbers, and quaternions. Lipschitz (1886) was the 
first to define groups constructed from “Clifford 
numbers” and use them to represent rotations in a 


Euclidean space. Cartan discovered representations of 
the Lie algebras so,(C) and so,(R),» > 2, that do 
not lift to representations of the orthogonal groups. 
In physics, Clifford algebras and spinors appear for 
the first time in Pauli’s nonrelativistic theory of the 
“magnetic electron.” Dirac (1928), in his work on the 
relativistic wave equation of the electron, introduced 
matrices that provide a representation of the Clifford 
algebra of Minkowski space. Brauer and Weyl (1935) 
connected the Clifford and Dirac ideas with Cartan’s 
spinorial representations of Lie algebras; they found, 
in any number of dimensions, the spinorial, projective 
representations of the orthogonal groups. 


Clifford algebras and spinors are implicit in 
Euclid's solution of the Pythagorean equation x? — 
y -- z? =0, which is equivalent to 


ue" R )=2(? Joe a2  [u 
Z y+x q 


and gives x=q* — p*, y=p* + q?, z=2pq. If the 
numbers appearing in [1] are real, then this equation 
can be interpreted as providing a representation of a 
vector (x,y,z) € R?, null with respect to a quadratic 
form of signature (1,2), as the “square” of a spinor 
(p,q) € R?. The pure spinors of Cartan (1938) 
provide a generalization of this observation to 
higher dimensions. 

Multiplying the square matrix in [1] on the left by 
a real, 2 x 2 unimodular matrix, on the right by its 
transpose, and taking the determinant, one arrives at 
the exact sequence of group homomorphisms: 


1 > Z2 > SL2(R) = Spin? , — SO}, 1 


Multiplying the same matrix by 


Of9 - ; 
-[ 4) 2 


on the left and computing the square of the product, 
one obtains 


2 
| z d -a ;) 
X—y =z O 14 


This equation is an illustration of the idea of 
representing a quadratic form as the square of a 
linear form in a Clifford algebra. Replacing y by iy, 
one arrives at complex spinors, the Pauli matrices, 


0 1 1 0 
Ox = 5 Oz = 
1I. 9 0 一 1 


Spin; = SU», etc. 

This article reviews Clifford algebras, the asso- 
ciated groups, and their representations, for quad- 
ratic spaces over complex or real numbers. These 
notions have been generalized by Chevalley (1954) 
to quadratic spaces over arbitrary number fields. 


Oy = iE, 


Notation 


If S is a vector space over K=R or C, then S* 
denotes its dual, that is, the vector space over K 
of all K-linear maps from S to K. The value of w € 
S* on s€S is sometimes written as (s,w). 
The transpose of a linear map f:$, — $5 is the 


map f*:S; — S1 defined by (s, f*(w)) — (f(s) w) for 
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every s € $4 and w € 55. If S; and S are complex 
vector spaces, then a map f:Sı — $2 is said to be 
semilinear if it is R-linear and f(is) = —if(s). The 
complex conjugate of a finite-dimensional complex 
vector space § is the complex vector space $ of all 
semilinear maps from $* to C. There is a natural 
semilinear isomorphism (complex conjugation) $ 一 S, 
s= ï such that (w,s) =(s,w) for every we S*. 
The space S can be identified with S and then s=s. 
The spaces (S)' and S* are identified. If f:S, — S2 
is a complex-linear map, then there is the complex- 
conjugate map f:S1 $5 given by f(s) = f(s) and 
the Hermitian conjugate map fis FSi Be. 
A linear map A:S — S' such that A! — A is said to 
be Hermitian. K(N) denotes, for K = R, C or H, the 
set of all N by N matrices with elements in K. 


Real, Complex, and Quaternionic Structures 


A real structure on a complex vector space $ is a 
complex-linear map C:S — $ such that CC=ids. 
A vector s € S is said to be real if s 2 C(s). The set of 
all real vectors is a real vector space; its real 
dimension is the same as the complex dimension of S. 

A complex-linear map C:S— S such that 
CC — — ids defines on $ a quaternionic structure; a 
necessary condition for such a structure to exist is 
that the complex dimension m of S be even, m= 2, 
n € N. The space S with a quaternionic structure 
can be made into a right vector space over the field 
H of quaternions. In the context of quaternions, it is 
convenient to represent the imaginary unit of C as 
V—1. Multiplication on the right by the quaternion 
unit 1 is realized as the multiplication (on the left) by 
V—1. If j and k=ij are the other two quaternion 
units and s € S, then one puts sj = C(s) and sk =sij. 

A real vector space $ can be complexified by 
forming the tensor product C ®r S=S @ iS. 

The realification of a complex vector space S is the 
real vector space having S as its set of vectors so that 
dimg S= 2 dime S. The complexification of a realifica- 
tion of S is the “double” S ® S of the original space. 


Inner-Product Spaces and Their Groups 


Definitions: quadratic and symplectic spaces A 
bilinear map B:S x S — K on a vector space S over 
K is said to make S into an inner-product space. To 
save on notation, one also writes B:S — S* so that 
(s, B(t)) - B(s,t) for all s,te€S. The group of 
automorphisms of an inner-product space, 

Aut(S, B) - (R € GL(S)|R* o Bo R=B} 


is a Lie subgroup of the general linear group GL(S). 
An inner-product space (S,B) is said here to be 
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quadratic (resp., symplectic) if B is symmetric (resp., 
antisymmetric and nonsingular). A quadratic space is 
characterized by its quadratic form s ++ B(s,s). For 
K=C, a Hermitian map A:S— $' defines a 
Hermitian scalar product A(s, 1) = (s, A(t)). 

An orthogonal space is defined here as a quadratic 
space (S, B) such that B:S — S* is an isomorphism. 
The group of automorphisms of an orthogonal space 
is the orthogonal group O(S,B). The group of 
automorphisms of a symplectic space is the sym- 
plectic group Sp(S, B). The dimension of a symplec- 
tic space is even. If $= K^" is a symplectic space 
over K=R or C, then its symplectic group is 
denoted by Sp;,(K). Two quaternionic symplectic 
groups appear in the list of spin groups of low- 
dimensional spaces: 


Sp,(H) — (a € HQ) |d'a I) 
and 
Sp, ,(H) = {a E H(2) |a'o,a=a;} 


Here df denotes the matrix obtained from a by 
transposition and quaternionic conjugation. 


Contractions, frames, and orthogonality From now 
on, unless otherwise specified, (V,g) is a quadratic 
space of dimension m. Let AV = @y_o APV be its 
exterior (Grassmann) algebra. For every v € V and 
w € ^V there is the contraction g(v) |w characterized 
as follows. The map Vx AV — AV, (v,w)— 
g(v) |w, is bilinear; if x € AP V, then g(v)|(x ^w) = 
(gv) |x) Aw + (—1)?x ^ (g(v) w) and gv) Jo —g(v,). 

A frame (e,) in a quadratic space (V,g) is said to 
be a quadratic frame if p Æ v implies g(e,, e,) — 0. 

For every subset W of V there is the orthogonal 
subspace W- containing all vectors that are ortho- 
gonal to every element of W. 

If (V, g) is a real orthogonal space, then there is an 
orthonormal frame (e,), p= 1,...,7:, in V such that 
k frame vectors have squares equal to —1, / frame 
vectors have squares equal to 1 and k+/=m. The 
pair (k,/) is the signature of g. The quadratic form g 
is said to be neutral if the orthogonal space (V, g) 
admits two maximal totally null subspaces W and 
W' such that V = W @ W'. Such a space V is 2n- 
dimensional, either complex or real with g of 
signature (n,n). A Lorentzian space has maximal 
totally null subspaces of dimension 1 and a 
Euclidean space, characterized by a definite quad- 
ratic form, has no null subspaces. The Minkowski 
space is a Lorentzian space of dimension 4. 

If (V, g) is a complex orthogonal space, then an 
orthonormal frame (e,), j4-—1,...,", can be 


chosen in V so that, defining g,,— g(e,,e,), one 
has g,,—(— 1)^*! and, if HÆ v, then g,, — 0. 

If A:S— S' is a Hermitian isomorphism, then 
there is a (pseudo)unitary frame (e,) in S such that 
the matrix Agg=A(e,,e3) is diagonal, has p 1’s 
and q —1’s on the diagonal, p + 4 — dim S. If p — q, 
then A is said to be neutral. A is definite if either p 
or g=0. 


Algebras 


Definitions An algebra over K is a vector space .A 
over K with a bilinear map A x A — A, (a, b) — ab, 
which is distributive with respect to addition. 
The algebra is associative if (ab)c — a(bc) holds for 
all a,b,c € .A. It is commutative if ab — ba for all 
a,b eA An element 14 is the unit of A if 
1 4a — 414 =a holds for every a € A. 

From now on, unless otherwise specified, the bare 
word algebra denotes a finite-dimensional, associa- 
tive algebra over K — R or C, with a unit element. 
If S is an N-dimensional vector space over K, then the 
set EndS of all endomorphisms of $ is an N?- 
dimensional algebra over K, the product being 
defined by composition; if f,g € End S, then one 
writes fg instead of fog; the unit of EndS is 
the identity map 1. By definition, homomorphisms 
of algebras map units into units. The map K — A, 
aala is injective and one identifies K with its 
image in .A by this map so that the unit can be 
represented by 1€ KC A. A set BCA is said to 
generate .4 if every element of .4 can be represented 
as a linear combination of products of elements of B. 
For example, if V is a vector space over K, then its 
tensor algebra 


T(V) = o? , @V 


is an (infinite-dimensional) algebra over K generated 
by K@V. The algebra of all N x N matrices 
with entries in an algebra A is denoted by .A(N). 
Its unit element is the unit matrix J. In particular, 
R(N), C(N), and H(N) are algebras over R. The 
algebra R(2) is generated by the set {0x,0;}. As a 
vector space, the algebra R(2) is spanned by the set 
11,05, €6,0;]. 

The direct sum A@B of the algebras A and B 
over K is an algebra over K such that its underlying 
vector space is A x B and the product is defined by 
(a, b) - (a', b') 2 (aa, bb') for every a,a'€.A and 
b,b' € B. Similarly, the product in the tensor 
product algebra .A & B is defined by 


(a & b) - (d & b') - aa' & bb [3] 


For example, if .A is an algebra over R, then the 
tensor product algebra R(N) &g .A is isomorphic to 
A(N) and 

K(N) @x K(N’) = K(NN") [4] 


for K=R or C and N,N’ € N. There are isomorph- 
isms of algebras over R: 


Ce@rC=C@C 
C Gg H=C(2) [5] 
H &g H=R(4) 


An algebra over R can be complexified by complex- 
ifying its underlying vector space; it follows from [5] 
that C(2) is the complex algebra obtained by 
complexification of the real algebra H. 

The center of an algebra .A is the set 


Z(A) ^ (a € A|ab — ba V b € A) 


The center is a commutative subalgebra containing 
K. An algebra over K is said to be central if its center 
coincides with K. The algebras R(N) and H(N) are 
central over R. The algebra C(N) is central over C, 
but not over R. 


Simplicity and representations Let B, and 5; 
be subsets of the algebra A. Define B4B; = (bib; | 
b, € Bi, b; € B2}. A vector subspace B of A is said 
to be a left (resp., right) ideal of A if AB C B (resp., 
BA C B). A two-sided ideal — or simply an ideal — is 
a left and right ideal. An algebra A Z {0} is said to 
be simple if its only two-sided ideals are (0) and .A. 

For example, the algebras R(N) and H(N) are 
simple over R; the algebra C(N) is simple when 
considered as an algebra over both R and C; every 
associative, finite-dimensional simple algebra over R 
or C is isomorphic to one of them. 

A representation of an algebra A over K in a vector 
space $ over K is a homomorphism of algebras p: A 一 
End S. If p is injective, then the representation is said to 
be faithful. For example, the regular representation p: 
A — End A of an algebra A, defined by p(a)b — ab 
for all a, b € A, is faithful. A vector subspace T of 
the vector space $ carrying a representation p of A 
is said to be invariant for p if p(a)T C T for every 
a € A; it is proper if distinct from both {0} and S. 
For example, a left ideal of .A is invariant for the 
regular representation. Given an invariant subspace 
T of p one can reduce p to T by forming the 
representation pr: A — End T, where pr(a)s= p(a)s 
for every a €.A and sc T. A representation is 
irreducible if it has no proper invariant subspaces. 

A linear map F:S; — S2 is said to intertwine the 
representations p, :.A — End Sı and p; :.A — End S3 if 
Fpi(a)-— p»(a)F holds for every a€ A. If F is an 
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isomorphism, then the representations pı and p» are 
said to be equivalent, pı ~ p». The following two 
propositions are classical: 


Proposition (A) 


(i) An algebra over K is simple if and only if it 
admits a faitbful irreducible representation in a 
vector space over K. Such a representation is 
unique, up to equivalence. 

(ii) The complexification of a central simple algebra 

~ over R is a central simple algebra over C. 


For real algebras, one often considers complex 
representations, that is, representations in complex 
vector spaces. Two such representations p,:A— 
End 8, and p: A — End $; are said to be complex 
equivalent if there is a complex isomorphism F:$, 一 
$5 intertwining the representations; they are real 
equivalent if there is an isomorphism among the 
realifications of S, and S2, intertwining the 
representations. For example, C, considered as an 
algebra over R, has two complex-inequivalent 
representations in C: the identity representation 
and its complex conjugate. The realifications of 
these representations, given by i£ and i —e, 
respectively, are real equivalent: they are intertwined 
by oz. The real algebra H, being central simple, has 
only one, up to complex equivalence, representation 
in C^: every such representation is equivalent to the 
one given by 


i—c,/v-1,  jeoyvV-1,  koo/v-1 


This representation extends to an injective homo- 
morphism of algebras ;: H(N) — C(2N) which is used 
to define the quaternionic determinant of a matrix a € 
H(N) as detu(a)-—deti(a), so that dety(a)>0 and 
dety (ab) =dety(a)dety(b) for every a,b € H(N). In 
particular, if q € H and A, € R, then deti(q) ^ qq and 


A 
dena ( : ") = 00+ ay [6] 
-4 H 


‘There are quaternionic unimodular groups 
SLN(H) = {a € H(N)|deta(a)=1}. For example, 
the group SL;(H) is isomorphic to SU; and SL;(H) 
is a noncompact, 15-dimensional Lie group, one of 
the spin groups in six dimensions. 


Antiautomorphisms and inner products An auto- 
morphism of an algebra .A is a linear isomorphism a: 
A— A such that a(ab)=a(a)a(b). An invertible 
element c € A defines an inner automorphism Ad(c) € 
GL(A), Ad(c)a — cac !. Complex conjugation in C, 
considered as an algebra over R, is an automorphism 
that is not inner. An antiautomorphism of an 
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algebra A is a linear isomorphism 8:.A — A such that 
Blab)= 8(b)B(a) for all a,b € A. An (anti)auto- 
morphism f is involutive if ? —id. For example, 
conjugation of quaternions defines an involutive 
antiautomorphism of H. 

Let p:.A — End S be a representation of an algebra 
with an involutive antiautomorphism 5. There is then 
the contragredient representation ġ: A — End S* given 
by p(a) — (p(B(a)))' . If, moreover, A is central simple 
and p is faithful irreducible, then there is an isomorph- 
ism B:S — S* intertwining p and ğ which is either 
symmetric, B* =B, or antisymmetric, B* — —B. It 
defines on S the structure of an inner-product space. 
This structure extends to End $: there is a symme- 
tric isomorphism B & B^! : End $ — (End $)* = End S* 
given, for every f € End S, by (B & B-!)(f) 2 Bf B+. 

Let K* = K\{0} be the multiplicative group of the 
field K. Given a simple algebra .A with an involutive 
antiautomorphism 6, one defines N(a) — f(a)a and 
the group 


9(8) = {a € A| N(a) € K*} 


Let p: A — End S be the faithful irreducible represen- 
tation as above, then, for a € .A and s,t € S, one has 


B(p(a)s, p(a)t) — N(a)B(s, t) 


If a € GiB) and A € K*, then Aa € G(8) and the norm 
N satisfies N(Aa) = A*N(a). The inner product B is 
invariant with respect to the action of the group 


Gi (8) = {a € €(8) | N(a) 2 1] 


Proposition (B) Let .A be a central simple algebra 
over K with an involutive antiautomorphism B and a 
faithful irreducible representation p so that 


p(a) = Bp(a)B ! 
The map b : Ax .A — K defined by 
h(a, b) — tr p(B(a)b) 


is bilinear, symmetric, and nondegenerate. The map 
p is an isometry of tbe quadratic space (.A, b) on its 
image in the quadratic space (End S, B & B^! ). 


Graded Algebras 


Definitions An algebra .A is said to be Z-graded 
(resp., Z2-graded) if there is a decomposition of the 
underlying vector space A= GOyez.A"  (resp., 
A= A? @ A!) such that AŻ A1 c A?*4 In a Z-graded 
algebra, it is understood that p + 4 is reduced mod 2. If 
a € A’, then a is said to be homogeneous of degree p. 
The exterior algebra AV of a vector space V is 
Z-graded. Every Z-graded algebra becomes Z-graded 


when one reduces the degree of every element 
mod 2. A graded isomorphism of graded algebras 
is an isomorphism that preserves the grading. 

A Zo-grading of A is characterized by the 
involutive automorphism « such that, if a € AP, 
then a(a)=(—1)?a. From now on, grading means 
Z»-grading unless otherwise specified. The elements 
of A? (resp., A') are said to be even (resp., odd). It 
is often convenient to denote the graded algebra as 


AP ax A [7] 


Given such an algebra over K and N € N, one 
constructs the graded algebra A°(N) — A(N). Two 
graded algebras over K, A° — A and A” — A’ are 
said to be of the same type if there are integers N 
and N' such that the algebras AU (N) — A(N) and 
A? (N") — .A'(N') are graded isomorphic. The prop- 
erty of being of the same type is an equivalence 
relation in the set of all graded algebras over K. 

Given an algebra A, one constructs two “canoni- 
cal" graded algebras as follows: 


1. the double algebra 
A> AQA 


graded by the “swap” automorphism, a(a1, a2) = 
(a2,a1) for 41,42 € A; 
2. the algebra 


A &$.A — A(2) 


is defined by declaring the diagonal (resp., anti- 
diagonal) elements of .A(2) to be even (resp., odd). 


The real algebra R(2) has also another grading, 
given by the involutive automorphism o such that 
a(a) —&ae^!, where a € R(2) and e is as in [2]. In 
this case, [7] reads 


C — R(2) 
There are also graded algebras over R: 
R—C, CH, and H —> C(2) 


The grading of the last algebra can be defined by 
declaring the Pauli matrices and il to be odd. 


Super Lie algebras A super Lie algebra is a graded 
algebra A such that the product (a,b) [a,b] is 
super anticommutative, [a,b] — — (—1)/?[b, a], and 
satisfies the super Jacobi identity, 


la, [b, c] = [[a, b], c] + (- 1)" [b, [a, c]] 


for every a € A’, b € Al and c € A. To every graded 
associative algebra .A there corresponds a super Lie 
algebra GLA: its underlying vector space and 
grading are as in A and the product, for a € A? 


and b € A’, is given as the supercommutator [a, b] = 
ab — (—1)?%ba. 


Supercentrality and graded simplicity A graded 
algebra A over K is supercentral if Z(.A) N A? =K. 
The algebra R — C is supercentral, but the real 
ungraded algebra C is not central. 

A subalgebra B of a graded algebra A is said to be 
a graded subalgebra if B=BNA°@BNA'. A 
graded ideal of .A is an ideal that is a graded 
subalgebra. A graded algebra A Æ {0} is said to be 
graded simple if it has no graded ideals other than 
{0} and A. The double algebra of a simple algebra is 
graded simple, but not simple. 


The graded tensor product Let A and B be graded 
algebras; the tensor product of their underlying 
vector spaces admits a natural grading, (A @ B) = 
&, A? @ Br I. The product defined in [3] makes 
A & B into a graded algebra. There is another “super” 
product in the same graded vector space given by 


(a & b) - (à! & b') 2 (-1Y'"*aa' @ bb’ 


for a’ € A’ and b € B?. The resulting graded algebra 
is referred to as the graded tensor product and 
denoted by A®B. For example, if V and W are 
vector spaces, then the Grassmann algebra A(V @ 
W) is isomorphic to AV & ^ W. 


Clifford Algebras 
Definitions: The Universal Property and Grading 


The Clifford algebra associated with a quadratic 
space (V,g) is the quotient algebra 


CV, g) 2 T(V)/J(V,g) [8] 


where 7(V,g) is the ideal in the tensor algebra 7 (V) 
generated by all elements of the form v&v-— 
g(v,v)lr(v), v € V. 

The Clifford algebra is associative with a unit 
element denoted by 1. One denotes by « the 
canonical map of 7(V) onto C/(V,g) and by ab 
the product of two elements a,b € Cé(V,g) so that 
«(P & Q)- &(P)«(Q) for P, O € T(V). The map x is 
injective on K @ V, and one identifies this subspace of 
T (V) with its image under «. With this identification, 
for all u,v € V, one has 


uv + vu — 2g(u,v) 


Clifford algebras are characterized by their universal 
property described in the following proposition. 


Proposition (C) Let A be an algebra with a unit 14 
and let f : V — A be a Clifford map, that is, a linear 
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map such that f (v) = g(v,v)1 4 for every v € V. There 
then exists a homomorphism f:CK(V,g) —^ A of 
algebras with units, an extension of f, so that f (v) = f(v) 
for every v € V. 


As a corollary, one obtains 


Proposition (D) If f is an isometry of (V,g) into 
(W, b), then there is a homomorphism of algebras 
Ce(f):CL(V, g) — Cl(W, b) extending f so that there 
is the commutative diagram 


CKV,g) =O C(W,b) 
| | 
V 一 一 W 
f 
For example, the isometry v — —v extends to the 


involutive main automorphism a of C/(V, g), defin- 
ing its Z»-grading: 


CLV, g) - CÜ(V, g) e Ct (V, g) 


The algebra C/(V, g) admits also an involutive cano- 
nical antiautomorphism 8 characterized by 6(1)=1 
and f(v) — v for every v € V. 


The Vector Space Structure of Clifford Algebras 


Referring to proposition (D), let A= End( ^ V) and, for 
every v € V and w € AV, put f(v)w =v ^ w + glv) |w, 
then f: V — End( ^V) is a Clifford map and the map 


i : CLV, g) — AV [9] 
given by ila) = fla) py is an isomorphism of vector 
spaces. This proves. 


Proposition (E) As a vector space, the algebra 
CV, g) is isomorphic to tbe exterior algebra ^V. 


If V is m-dimensional, then Cé(V,g) is 
2"-dimensional. The linear isomorphism [9] defines a 
Z-grading of the vector space underlying the Clifford 
algebra: if i(a,) € A*V, then a, is said to be of 
Grassmann degree k. Every element a € C/(V, g) 
decomposes into its Grassmann components, 
a= Yjpez ap. The Clifford product of two elements of 
Grassmann degrees k and / decomposes as follows: 
And) = 2pez (abi), and (apb)), =0 if p < |k — I| or 
pzk-—l--1mod2orp » m — |m — k — I]. 

One often uses [9] to identify the vector spaces AV 
and C/(V, g); this having been done, one can write, 
for every v € V and a € C/(V, g), 


va —v ^a 4 g(v)]|a [10] 


so that [v,a] = 2g(v) |a, where [,] is the supercommu- 
tator. It defines a super Lie algebra structure in the 
vector space K ® V. The quadratic form defined by g 
need not be nondegenerate; for example, if it is the 
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0-form, then [10] shows that the Clifford and exterior 
multiplications coincide and C/(V, 0) is isomorphic, as 
an algebra, to the Grassmann algebra. 


Complexification of Real Clifford Algebras 


Proposition (F) If (V,g) is a real quadratic space, 
then the algebras C & CI(V,g) and CHC & V,C &g) 
are isomorpbic, as graded algebras over C. 


From now on, through the end of the article, one 
assumes that (V,g) is an orthogonal space over 
K=R or C. 

The Clifford algebra associated with the orthogo- 
nal space C" is denoted by Cé,,. The Clifford 
algebra associated with the orthogonal space 
(R^. g), where g is of signature (k,/), is denoted 
by C£, 1, so that C & C£, 1 — CE... 


Relations between Clifford Algebras in Spaces of 
Adjacent Dimensions 


Consider an orthogonal space (V, g) over K and the 
one-dimensional orthogonal space (K,/;), having a 
unit vector w € K, h\(w,w)=e, where £— 1 or —1. 
The map V5vevwtc CPP (V o K,g ® hı) satisfies 
(vw) =—eg(v,v) and extends to the isomorphism 
of algebras Cl(V, 一 sg) —^ CÉ(V &G K,g Ghi). This 
proves 


Proposition (G) There are isomorphisms of algebras: 
Chm — Clo, 4 and Chp > C 1. 


Consider the orthogonal space (K*,h) with a 
neutral b such that, for A,u€K, one has 
((A, 2), b(A, 4)) = Aj. The map 


0 A 
RK),  Q)e | ) 
u 0 


has the Clifford property and establishes the 
isomorphisms represented by the horizontal arrows 
in the diagram 


Cl(K2,h) —  K(2) 


T 1 [11] 
CP(K^,b — KOK 


Proposition (H) If (K*,h) is neutral and (V,g) is 
over K, then tbe algebra Cl(\V@K*,g@hb) is 
isomorphic to the algebra C((V, g) & K(2) Specifically, 
there are isomorphisms 


Cb i141 = Chk 1 & R(2) 


[12] 
Clin+2 = Chm 的 C(2) 


The Chevalley Theorem and the Brauer-Wall 
Group 


If (V, g) and (W, b) are quadratic spaces over K, then 
their sum is the quadratic space (V $ W,g à b) 
characterized by g $ b:V GW — V* @ W* so that 
(g @ b)(v,w) —(g(v),b(w)). By noting that the map 
VoW2(v,w)5ve1--1ewcCA(V,g) &C(W,b) 
has the Clifford property, Chevalley proved 


Proposition (I) The algebra CAV  W,g«ab) is 
isomorphic to the algebra CLV, g) & C((W, b). 


The type of the (graded) algebra CV @ W, g & b) 
depends only on the types of C/(V,g) and C/(W, b). 
The Chevalley theorem (I) shows that the set of types 
of Clifford algebras over K forms an abelian group for 
a multiplication induced by the graded tensor product. 
The unit of this Brauer- Wall group of K is the type of 
the algebra C/(K?,b) described in [11]; for a full 
account with proofs, see Wall (1963). 


The Volume Element and the Centers 


Let e= (e,) be an orthonormal frame in (V,g). The 
volume element associated with e is 


7] = €1€2* EE Zone 


If »/ is the volume element associated with another 
orthonormal frame e' in the same orthogonal space, 
then either 7/=7 (e and e' are of the same 
orientation) or 7/— —7] (e and e' are of opposite 
orientation). For K=C, one has n? —1; for K=R 
and g of signature (k,l) one has 

rf = (—1)0/20-D6-H1) [13] 
It is convenient to define ; € {1,i} so that n? = /?. For 
every v € V one has vy — (— 1)"*! gy. The structure of 
the centers of Clifford algebras is as follows: 


Proposition (J) If m is even, then Z(C/(V,g)) - K 
and Z(CÜ (V,g))\=K@Kn. If m is odd, then 
Z(C(V,g)) 2 Ko Kn and Z(C?(V,g)) - K. 

The graded algebra C((V,g) is supercentral for 
every m. 


The Structure of Clifford Algebras 


The complex case Using [4] one obtains from [11] 
and [12] the isomorphisms of algebras 


C6, a Ch, =C(2”) [14] 
Chia 508. CO") Gg CQ") [15] 
for 1 — 0,1,2,... . Therefore, there are only two types 


of complex Clifford algebras, represented by 
C—CaGC and Co C- C(2): the Brauer-Wall 
group of C is Z2. 


ee ER UE s n 


p nd —— 


s ccm 


The real case In view of proposition (I) and 
Cl; 1 — R(2), the algebra C£, ; is of the same type as 
Cl, 49 if k » [ and of the same type as Clo, i-k 
if k«l. Since C£, 169 Chip = Cle st bl the type 
of Ch; , is the inverse of the type of C£, 1. The a 
C 0 — Cl40 is isomorphic to H H — H(2): 
x = (x1,%2,x3,x4) E R^ C Cla.0, and g=ix; + i 
kx3 + x4 € H, then an isomorphism is obtained from 
the Clifford map f, 


0 q 
(oS) ous 


In view of [13], the volume element n satisfies n? = 1. 
By replacing —4 with g in [16], one shows that MU 4 
is also isomorphic to jw The map R* x Rt" 一 ， 
H(2) & C£, , given by (x,y) f(x) 814-99 y = 
the Clifford property and establishes the isomorphism 
of algebras Chi4;=H@C&). Since, similarly, 
Cle 1,4 = H & C£, 1, one obtains the isomorphism 


CE, L4 1 — CÓ. La 
Therefore, 
Cli 9 — Checa a — CÓ 48 — CE,  R(16) 


and the algebras C£ 1, C£, ;, and C£, 1+8 are all of the 
same type. This double periodicity of period 8 is 
subsumed by saying that real Clifford algebras can be 
arranged on a “spinorial chessboard.” The type of 
C5 ı — Clg ı depends only on k — l mod 8; the eight 
types have the following low-dimensional algebras as 
representatives: Ch. 05 C£. 05 Ces 05 Cha 0 = Co, 45 Clo, $s 
Clo », and CLo 1. The Brauer-Wall group of R is Zg, 
generated by the type of e 9 — Ch, o, that is, by R 一 
C. Bearing in mind the isomorphism Cé,. [cR TE 
and abbreviating C — R(2) to C — R, etc., one can 
arrange the types of real Clifford algebras in the form 
of a “spinorial clock": 


R 5 RƏR & R 
61 | 1 
C C [17] 
51 12 


H — H@H «+ H 
4 3 


Proposition (K) Recipe for determining cá po 
CE, |: 


(i) find tbe integers u and v such that 
k—l=8yu+v and 0 v7; 

(ii) from the spinorial clock, read off A) > vA, and 
compute the real dimensions, dim A? =27 and 
dim A, — 2^; and 

(iii) form | C, j— AQ(2/20-4-1-79)) and C,,— 
A, (2(1/2)0 77). 
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The spinorial clock is symmetric with respect to 
the reflection in the vertical line through its center; 
this is a consequence of the isomorphism of algebras 
Clk 1,5 = Clik & R(2). 

Note that the “abstract” algebra Cé, ; carries, in 
general, less information than the Clifford algebra 
defined in [8], which contains V as a distinguished 
vector subspace with the quadratic form 
v++v*=g(v,v). For example, the algebras Cés 0， 
CL4. 4, and Clo s are all graded isomorphic. 


Theorem on Simplicity 


From general theory (Chevalley 1954) or by inspec- 
tion of [14], [15], and [17], one has 


Proposition (L) Let m be tbe dimension of tbe 
orthogonal space (V,g) over K. 


(i) If m is even (resp., odd), then the algebra 
CEV, g) (resp., CU (V, g)) over K is central simple. 

(ii) If K=C and m is odd (resp., even), then the 
algebra Cl(V,g) (resp., C (V,g)) is the direct 
sum of two isomorphic complex central simple 
algebras. 

(iii) If K=R and m is odd (resp., even), then the 
algebra Cl(V, g) (resp., CI (V,g)) when n? =1 is 
tbe direct sum of two isomorpbic central simple 
algebras and when w= —1 is simple with a 
center isomorphic to C. 


Representations 


The Pauli, Cartan, Dirac, and Weyl 
Representations 


Odd dimensions Let (V,g) be of dimension 
m — 2n «- 1 over K. From propositions (A) and (L) it 
follows that the central simple algebra CP (V, g) has a 
unique, up to equivalence, faithful, and irreducible 
representation in the complex 2"-dimensional vector 
space $ of Pauli spinors. By putting o(7)=vl it is 
extended to a Pauli representation 9: C/(V, g) 一 

End S. Given an orthonormal frame (e,) in V, Pauli 
endomorphisms (matrices if § is identified with C^) 
are defined as o, = o(e,) € End S. The representations 
c and coa are complex inequivalent. For K=C 
none of them is faithful; their direct sum is the faithful 
Cartan representation of C/(V,g) inS@S. For K=R 
and (1/2)(k — | — 1) even, the representations c and 
go « are real equivalent and faithful. On computing 
3(n) one finds that the contragredient representation č 
is equivalent to c for n even and to ø o o for n odd. 


Even dimensions Similarly, for (V,g) of dimension 
m — 2n over K, the central simple algebra C/(V, g) 
has a unique, up to equivalence, faithful, and 
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irreducible representation y:Cé(V,g) — End S in the 
2"-dimensional complex vector space $ of Dirac 
spinors. The Dirac endomorphisms (matrices) are 
^u —^((6,). Put T = ey(n) so that T? =I: the matrix T 
generalizes the familiar ^. The Dirac representation ^ 
restricted to CÉ (V, g) decomposes into the sum y4 & Y- 
of two irreducible representations in the vector spaces 


54={s ES|Ts = +s} 


of Weyl (chiral) spinors. The elements of S, are said 
to be of opposite chirality with respect to those of 
S_. The transpose [* defines a similar split of S*. 
The representations y, and y_ are never complex- 
equivalent, but they are real equivalent and 
faithful for K 2 R and (1/2)(k — I) odd. 

The representations ^ o o and ^ are both equiva- 
lent to y. It is convenient to describe simultaneously 
the properties of the transpositions of the Pauli and 
Dirac matrices; let p, be either the Pauli matrices 
for V of dimension 2m + 1 or the Dirac matrices for 
V of dimension 2z. There is a complex isomorphism 
B:S — S* such that 


p, = (-1)' Bp,B ! [18] 


In the case of the Dirac matrices, the factor (— 1)" in 
[18] implies that this equation also holds for T in 
place of p,. The isomorphism B preserves (resp., 
changes) the chirality of Weyl spinors for n even 
(resp., odd). Every matrix of the form By, ...%,5 
where 


1p < < jip2n [19] 


is either symmetric or antisymmetric, depending on 
p and the symmetry of B. A simple argument, based 
on counting the number of such products of one 
symmetry, leads to the equation 


Rt — (—,1) e Us 


valid in dimensions 2” and 2n + 1. 


Inner products on spinor spaces Let S be the 
complex vector space of Dirac or Pauli spinors 
associated with (V,g) over K. The isomorphism B: 
S—S defines on S$ an inner product 
B(s,t) = (s, B(t)), s,t € Sj which is orthogonal for 
m=0,1,6, or 7mod8 and symplectic for m= 
2,3,4, or 5mod8. For m = 0 mod4, this product 
restricts to an inner product on the space of Weyl 
spinors that is orthogonal for m=QOmod8 and 
symplectic for m = 4 mod 8. For m = 2 mod4, the 
map B defines the isomorphisms B.:$. 一 S... 


Example One of the most used representations ^: 
Cl; 1 — C(4) is given by the Dirac matrices 


0 o 0 ay 
or 一 Or 0 —g, 0 

Ü  d& 0 I 20} 
Ya is 0 , y4 = I 0 


Change Conjugation and Majorana Spinors 


Throughout this section and next, one assumes 
K=R so that, given a representation p:C/(V,g) 一 
End S,one can form the complex- (“charge”) conjugate 
representation p:Cl(V,g) 一 EndS defined by 


p(a) — pla) and the Hermitian conjugate representa- 
tion p! :C/(V, g) — End S , where p'(a) = p(a). 


Even dimensions The representations 7 and y are 
equivalent: there is an isomorphism C:$ — S such 
that 


Yu = Coe [21] 


The automorphism CC is in the commutant of 4; it 
is, therefore, proportional to I and, by a change of 
scale, one can achieve CC=I for k—12z0 or 
6 mod 8 and CC = —1 for k — l = 2 or 4 mod 8. 

The spinor s. = C^!s € S is the charge conjugate of 
s € S. If :V — S is a solution of the Dirac equation 


("Op — igA,) — &)p=0 


for a particle of electric charge q, then v, is a 
solution of the same equation with the opposite 
charge. Since 


T=2cCrc! 


charge conjugation preserves (resp., changes) the 
chirality of Weyl spinors for (1/2)(k — I) even (resp., 
odd). 

If CC— I, then 


ReS={s €S|s,=s} 


is a real vector space of dimension 2”, the space of 
Dirac-Majorana spinors. The representation ^ is 
real: restricted to Re $ and expressed with respect to 
a frame in this space, it is given by real 2" x 2" 
matrices. For k — | = 0 mod 8 the representations +, 
and ^. are both real: in this case there are 
Weyl-Majorana spinors. 


Odd dimensions On computing o(7) one finds that 
the conjugate representation 5 is equivalent to c 


(resp., co a) if r^ —1 (resp., n? = —1). There is an 
isomorphism C:S — S such that 


g,-(-1)0/20*-80)c5 C7 [22] 


and CC— I (resp., CC— — I) fork — 1 = 10r 7 mod8 
(resp., k — | = 3 or 5 mod 8). For k — I = 1 mod 8, the 
restriction of the Pauli representation to CH , is real 
and the Pauli matrices are pure imaginary; for k — | = 
7 mod 8, the Pauli representations of Cé, ; are both real 
and so are the Pauli matrices. In both these cases there 
are Pauli-Majorana spinors. 


Hermitian Scalar Products and Multivectors 


For m —k--] odd and C as in [22], the map 
A—BC:S— S' intertwines the representations c! 
and o (resp., o o a) for k even (resp., odd), 


of — (71) Ac, A" 


By rescaling of B, the map A can be made 
Hermitian. The corresponding Hermitian form 
s> A(s,s) is definite if and only if k or /—0; 
otherwise, it is neutral. 

For m — k + even, the representations yi and y 
are equivalent and one can define a Hermitian 
isomorphism A:S — §* so that 


d es Ay, A? [23] 


The isomorphism A' — AT intertwines the represen- 
tations yi and yo a; it can also be made Hermitian 
by rescaling. The Hermitian form A(s,s) is definite 
for k=0 and A'(s,s) is definite for / — 0; otherwise, 
these forms are neutral. For example, in the familiar 
representation [20], one has A — ^4, a neutral form. 

For p —0,1,...,77—2n, two spinors s and t€ S 
define the p-vector with components 


A i... pi (s, t) = (s, A cee Yb) [24] 
where the indices are as in [19]. The Hermiticity of 
A and [23] imply 

Att (Sst) = (71) A a (E55) 


In view of I! 2 (—1)* ATA"!, the map 4 defines, 
for k even, a nondegenerate Hermitian scalar 
product on the spaces $+ whereas A(s,t)=0 if s 
and t are Weyl spinors of opposite chiralities. For k 
odd, A changes the chirality. 


The Radon-Hurwitz Numbers 


Proposition (M) For every integer m > 0, the 
algebra Chln.o bas an irreducible real representation 
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p of dimension 2X". where x(m) is the mtb Radon- 
Hurwitz number given by 


and x(m + 8) 2 x(m) + 4. The matrices p, € R(2x), 
|. — 1,...,m, defining these representations satisfy 


PuPv + PuPu = — lbw 


and can be chosen so as to be antisymmetric. In all 
dimensions other than m = 3 mod 4 tbe representa- 
tions are faithful. 

For m=2 and 4mod8 (resp, m=1,3, and 
5 mod 8) the representations p are the realifications of 
the corresponding Dirac (resp., Pauli) representations. 
In dimensions m=Q and 6mod8 (resp., 
m = 7 mod 8) the Dirac (resp., Pauli) representations 
themselves are real. 


Inductive Construction 
of Representations 


An inductive construction of the Pauli 


representations 


a: Cb, 44, R(2""), n21,2,... 
and of the Dirac representations 
1: Cnn > R(2"), n=1,2,.-. 


is as follows. 


1. In dimension 1, put c; = 1. 
2. Given o, € R(2"1),,— 1,...,2» — 1, define 


0 ao, 
"Tu 0 for iu —1,...,21—1 


On 


and 


3. Given y, € R(2”), w=1,...,2m, define w= 
for w=1,...,2m, and 02941 = Y1 -- Y2n- 


All entries of these matrices are either 0, 1, or —1; 
therefore, they can be used to construct representa- 
tions of Clifford algebras of orthogonal spaces over 
any commutative field of characteristic 4 2. 

By induction, one has o7, =(=1) to. Therefore, 
the isomorphisms appearing in [18] are 
B =7274- -- 24 for both m= 2n and 2n + 1. 

By multiplying some of the matrices o,, or y, by the 
imaginary unit, one obtains complex representations 
of the Clifford algebras associated with the quadratic 
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forms of other signatures. For example, in dimension 
3, (01,102,03) are the Pauli matrices. In dimension 4, 
multiplying y2 by i one obtains the Dirac matrices for g 
of signature (1, 3), in the “chiral representation”: 


0 o, 0 oy 
as >) n, a) 

0 oc, 0 -I 
»-[, 4! »- (i o) 


To obtain the real Majorana representation one uses 
the following fact: 


[25] 


Proposition (N) If the matrix C € R(2") is such 
that C?=I and [21] holds, then the matrices 
(I+iC)y,(I+iC)*, p=1,...,27, [Nt are real]. 


For the matrices [25], one can take C= 4153^4 to 
obtain 


The real representations described in proposition 
(M) can be obtained by the following direct inductive 
construction. Consider the following seven real anti- 
symmetric and anticommuting 8 x 8 matrices: 

Pi = Oz G9 I G9 E, 
p3 = Gz B EQ Oz, 
P5 = Ox © Ox G9 E, 
p;—-teGIcGI 


pa = 0; B E O Ox 
P4 =0x9QEQI 


26 
Pé = 0x Q Oz, BE | | 


For 0 —4,5,6,and 7 the matrices pi1,..., pm gener- 
ate the representations of Cé,,9 in R8. The eight 
matrices 0, —0, 9 py, 1 —1,...,7, and 05—e&IG 
I&I give the required representation of Cfg, in 
Ré. By dropping the first factor in p1, 2, ps, one 
obtains the matrices generating a representation of 
Cło in Rf, etc. The symmetric matrix 
0 =06,---6g=0,@1@ IQI anticommutes with all 
the 0s and O^—]. If the matrices p, E€ R(2X?) 
correspond to a representation of C£, o, then the 
m + 8 matrices O $9 91,...,0 9 pm, 4; @1,...,03 QI 
generate the required representation of Cl s o. 


Vector Fields on Spheres 
and Division Algebras 


It is known that even-dimensional spheres have no 
nowhere-vanishing tangent vector fields. All such 


fields on odd-dimensional spheres can be constructed 
with the help of the representation p described in 
proposition (M). Given a positive even integer N, let 
m be the largest integer such that N —2* "p, where 
p is an odd integer. Consider the unit sphere 
Sn-1 = [x € R |||x|| 2 1) of dimension N — 1. For 
v € R”, put p'(v) 2 p(v) $9 I, where I € R(p) is the _ 
unit matrix. Since p(v) is antisymmetric, so is the 
matrix p'(v) € R(N). Therefore, for every x € SN_1， 
the vector p'(v)x is orthogonal to x. The map 
xe p(v)x defines a vector field on SN_1 that 
vanishes nowhere unless v —0: the (N—1)-sphere 
admits a set of m tangent vector fields which are 
linearly independent at every point. Using methods of 
algebraic topology, it has been shown that this 
method gives the maximum number of linearly 
independent tangent vector fields on spheres. 

If m= 1,3,o0r 7, then m + 1 — 21 and, for these 
values of m, the sphere Sm is parallelizable. More- 
over, one can then introduce in R”*! the structure 
of an algebra A,, as follows. Put po =I. If eo € g^" 
is a unit vector and e, = p,(eo), then (eo, e1, .. ., em) 
is an orthonormal frame in R"*!, The product of 
x= pro Xen and y= » 77 oye, is defined to be 


7H 
Am gne ` Xuyvpy(ev) 


pv =0 


so that eo is the unit element for this product. 
Defining Rex=xoeo, Imx=x — Rex, x=Rex—Imx, 
one has X-x=€o||x||~ and X-(x-y)=(x-x)-y, so that 
x-y=0 implies x=0 or y=0: A, is a normed 
algebra without zero divisors. The algebras A; and 
A3 are isomorphic to C and H, respectively, and A7 
is, by definition, the algebra O of octonions 
discovered by Graves and Cayley. The algebra O is 
nonassociative; its multiplication table is obtained 
with the help of [26]. 


Spinor Groups 


Let (V,g) be a quadratic space over K. If u € V is 
not null, then it is invertible as an element of 
CV, g) and the map vr —uvu' is a reflection in 
the hyperplane orthogonal to u. The orthogonal 
group O(V, g) 2 O(V, -g) - (R € GL(V)|R* ogo 
R = g} is generated by the set of all such reflections. 
A spinor group G is a subset of C/(V,g) that is a 
group with respect to multiplication induced by the 
product in the algebra, with a homomorphism 
p:G — GL(V) whose image contains the connected 
component SO"(V, g) of the group of rotations of 
(V,g). In the case of real quadratic spaces, one 
considers also spinor groups that are subsets of C ® 
Cé(V, g) with similar properties. By restriction, every 


representation of C/(V,g) or C & C/(V,g) gives 
spinor representations of the spinor groups it 
contains. 


Pin Groups 


It is convenient to define a unit vector v € Vc 
C/(V,g) to be such that 2? — 1 for V complex and 
v?=1 or —1 for V real. The group Pin(V,g) is 
defined as the subgroup of Cpin(V, g) consisting of 
products of all finite sequences of unit vectors. 
Defining now the twisted adjoint representation Ad 
by Ad(a)v = o(a)va ^! , one ontains the exact sequence 


1— Z2 > Pin(V,g)SO(V,g) —1 27 


If dimV is even, then the adjoint representation 
Ad(a)v— ava ! also yields an exact sequence like 
[27]; if it is odd, then the image of Ad is SO(V, g) and 
the kernel is the four-element group 11, —1,7, —7}. 

Given an orthonormal frame (e,) in (V,g) and 
a € Pin(V,g), one defines the orthogonal matrix 
R(a) - (R/(a)) by 


Ad(a)e, — e, R" (a) [28] 


If (V, g) is complex, then the algebras C/(V, g) and 
Cé(V, —g) are isomorphic; this induces an iso- 
morphism of the groups Pin(V,g) and Pin(V, —g). 
If V — C", then this group is denoted by Pin,,(C). If 
V — R*" and g of signature (k,/), then one writes 
Pin(V, g) = Pin, 1. A similar notation is used for the 
groups spin, see below. | 


Spin Groups 


The spin group Spin(V, g) — Pin(V, g) à C0°(V, g) is 
generated by products of all sequences of an even 
number of unit vectors. Since the algebras C/?(V, g) 
and C/?(V, —g) are isomorphic, so are the groups 
Spin(V,g) and Spin(V, —g). Since o(a) —a for a € 
Spin(V,g), the twisted adjoint- - representation 
reduces to the adjoint representation and yields the 
exact sequence 


1 > Za > Spin(V,g) S SO(V,g) 1 [29 


For V — C", the spin group is denoted by Spin (C). 
Since Spin,(C) C G4(8), the bilinear form B is 
invariant with respect to the action of this group. 


Spin? Groups 


The connected component Spin"(V, g) of the group 
Spin(V,g) coincides with Spin(V,g) if either the 
quadratic space (V,g) is complex or real and &/ — 0. 
In signature (k,l), the connect group Spin? , is 
generated in C}, by all products of the form 
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Ul ...U2,V;...U2, Such that w= —1 and w=1. 
The connected groups Spin, and Sping,, are 
isomorphic and denoted by Spin,,. Since Spiny. qe 
G4(8), the Hermitian form A and the bilinear form 
B are invariant with respect to the action of this 
group. Moreover, for k+ l even, from [24] and 
[28] there follows the transformation law of 
multivectors formed from pairs of spinors, 


Am-m (7Y(a)s, y(a)t) 
= Ay, a R a) ss RA) 


Hi Hp 


Consider Spin*(V, g) and assume that either V is 
complex of dimension Z2 or real with k or / > 2. 
Then there are two unit orthogonal vectors 
81,0? € V such that (e1,e2) = —1. The vector 
u(t) — e1cost + esin t is obtained from el by rotation 
in the plane span {e;,e2} by the angle t€ R. The 
curve tre u(t), 0 < t € m, connects the elements 
1 and —1 of Spin?(V, g). Its image in $O°(V, g), that 
is, the curve tr Ad(e;u(t)), 0 € t € m, is closed: 
Ad(1)=Ad(—1). This fact is often expressed by 
saying that “a spinor undergoing a rotation by 27 
changes sign." There is no homomorphism — not 
even a continuous map - f :SO*(V, g) — Spin®(V, g) 
such that Ad o f — id. 


Spin? Groups 


For the purposes of physics, to describe charged 
fermions, and in the theory of the Seiberg-Witten 
invariants, one needs the Spin^ groups that are spinorial 
extensions of the real orthogonal groups by the group U, 
of *phase factors." Assume V to be real and g of 
signature (k,/) so that the sequence [29] can be 
written as 


1 Zo = Spin, | = SO, | — 1 


Define the action of Z2 = (1, —1} in Spin, ; x Uj so 
that (—1)(a,z)=(— a, — z). The quotient (Spin, , x 
U1)/Z2 = Spin; , yields the extensions 


1 2 Ui — Spin, ; — SO, 1 
and 
1 — Spin, , 一 Spin, ; > Ui — 1 


For example, Spin; = SU; and Spin§ = U2. 


Spin Groups in Dimensions <6 


The connected components of spin groups asso- 
ciated with orthogonal spaces of dimension <6 are 
isomorphic to classical groups. They can be expli- 
citly described starting from the following 
observations. 
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Consider the four-dimensional vector space 
(of twistors) T over K, with a volume element 
vole A^T. The six-dimensional vector space 
V— A^T has a scalar product g defined by 
glu, v)vol 2 2u ^ v for u,v € V. The quadratic form 
g(u,u) is the Pfaffian, Pf(u). If u € V is represented 
by the corresponding isomorphism T* —^ T anda € 
End T, then Pf(aua*) — detaPf(u). The last for- 
mula shows Spin*(V, g)=SL(T), so that Spin; (C) = 
SL4(C). For K = R, the Pfaffian is of signature (3, 3), so 
that Spin? 3 =SL4(R). A non-null vector v € V defines 
a symplectic form on T*. The five-dimensional vector 
space v+ C V is invariant with respect to the symplec- 
tic group Sp(T*, 4) — Spin" (v^, Pf|v+). This shows that 
Spin;(C) 2 Sp,(C) and Spins, 3 = Sp,4(R). Spin groups 
for other signatures in real dimensions 6 and 5 are 
obtained by considering appropriate real subspaces of 
Cf and C^, respectively. For example, [6] is used to 
show that Spin} 5 =SL2(H). 

Spin groups in dimensions 4 and lower are 
similarly obtained from the observation that det is 
a quadratic form on the four-dimensional space K(2) 
and Cé°(K(2), det) = K(2) $ K(2). 

Several spin groups are listed below. 


The complex spin groups 
Spi (C= C^, Spin; (C) = SL; (C) 
Spin4(C) = SL2(C) x SL2(C) 
Spins(C) = Sp4(C) 
Sping(C) = SL4(C) 


The real, compact spin groups 


Spin, = U4, Spin; — SU; 
Spin, = SU; x SU2, Spin; — Sp; (H) 
Sping — SU, 


The groups Spiny ; forl<k<landk+Il< 6 
Spin}, =R", Spin? , =SL2(R) 
Spin} 3 = SL2(C) 

Spin? , =SL2(R) x SL2(R) 

Spin} 4 = Sp, (H) 
Spin} 3 = Sp4(R), 
Spin} 4 = SU22 
Spin3 ; =SL4(R) 


Spin} ; =SL2(H) 


See also: Dirac Operator and Dirac Field; Index 
Theorems; Relativistic Wave Equations Including Higher 
Spin Fields; Spinors and Spin Coefficients; Twistors. 
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Introduction 


The method of cluster expansions in statistical 
physics provides a systematic way of computing 
power series for thermodynamic potentials (loga- 
rithms of partition funtions) as well as correlations. 
It originated from the works of Mayer and others 
devoted to expansions for dilute gas. 


Mayer Expansion 


Consider a system of interacting particles with 
Hamiltonian 


Hx(p| NON rn) 
N 


N 
=F HS or ri Ty) [1] 


i=] i,j=1 


where ® is a stable and regular pair potential. 
Namely, we assume that there exists B > 0 such that 


N 
>, $(r; —rj)) > —BN [2] 
ij=1 
for all N=2,3,... and all (r1,... rn) ER and 
that 
C(8) = | |e- 9*0... 1|dir < oo i3] 


for some 57.0 (and hence all 8 0). Basic 
thermodynamic quantities are given in terms of the 
grand-canonical partition function 


oo N 3 
Z(B.AV)= Y xs]. E on Hd plld ri 


| 3N 
ict NI b 


Ya. "en A 


In the second expression we absorbed the factor 
resulting from the integration over impulses into 
(configurational) activity A— (2::/ Bb??? z. In par- 
ticular, the pressure p and the density p are defined 
by the thermodynamic limits (with V— oo in the 
sense of Van Hove) 


Rd. A 
p(8,) = lim vlog ZBAV) [SI 


Cluster Expansion 531 


and 


p(,d) = lim — >< log Z(&.AV) 加 


1 
um i^ 


Mayer series are the expansions of p and p in powers 
of X: 


Bp(B, X) = Y bn)” 7 
1 一 1 
and 
p(B, ^) = $^ nb, X" [8] 


17 一 1 


Mayer's idea for a systematic computation of 
coefficients b, was based on a reformulation of 
partition function Z(8,À, V) in terms of cluster 
integrals. Introducing the function 


f(r) = — 4 D 
and using G[N] to denote the set of all graphs on N 
vertices (1,..., N], we get 
oo AN N 
Zi. v)- Yrs f. HG fi - n» In 


where 
f(ri — rj) [[d ri [11] 


Observing that the weight w is multiplicative in 


connected components (clusters) g1,...,g, of the 
graph g, 
k 
w(g) =| [w(e) [12] 
£-1 


we can rewrite 


Z(B,A, V) = ly > 1I [13] 


n=o N ` (gi) BEG 


with the sum running over all disjoint collections {g;} 
of connected graphs with vertices in (1,..., N}. A 
straightforward exponential expansion can be used to 
show that, at least in the sense of formal power series, 


o0 yn 
lgZ(V)- Yo 5 we) — (14 
n=1 ` geC[n) 
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where C[z] is the set of all connected graphs on n 
vertices. Using b) to denote the coefficients 


by = Wa Eu [15] 


` geC[n] 


and observing that the limits limy — 4, (1/|V|)w(g) of 
cluster integrals exist, we get b, = limy... s b'), The 
convergence of Mayer series can be controlled directly 
by combinatorial estimates on the coefficients b'”). As a 
result, the diameter of convergence of the series [7] and 
[8] can be proved to be at least (C(3)e??5*!) !, A less 
direct proof is based on an employment of linear 
integral Kirkwood-Salsburg equations in a suitable 
Banach space of correlation functions. 

Similar combinatorial methods are available also 
for evaluation of coefficients of the virial expansion 
of pressure in powers of gas density, 


2? B, p" [16] 


n= 


Bp(B, p) 


obtained by inverting [8] (notice that bı — 1) and 
inserting it into [7]. One is getting 5, = limy ,x J” 
with 
1 1 
B = wg) [17] 
|V| nt! 
` ge Bin] 


where Bin] C C[n|] is the set of all 2-connected 
graphs on {1,... n}; namely, those graphs that 
cannot be split into disjoint subgraphs by erasing 
one vertex (and all adjacent edges). The diameter of 
convergence of the virial expansion turns out to be 
no less than (C()e(e??? + 1))*. 


Abstract Polymer Models 


An application of the ideas of Mayer expansions to 
lattice models is based on a reformulation of the 
partition function in terms of a polymer model, a 
formulation akin to [13] above. Namely, the partition 
function is rewritten as. a sum over collections of 
pairwise compatible geometric objects — polymers. 
Most often, the compatibility means simply their 
disjointness. f 

While the reformulation of “physical partition 
function” in terms of a polymer model (including the 
definition of compatibility) depends on particularities 
of a given lattice model and on the considered region of 
parameters — high-temperature, low-temperature, large 
external fields, etc. — the essence and results of cluster 
expansion may be conveniently formulated in terms of 
an abstract polymer model. 

Let G = (V, E) be any (possibly infinite) countable 
graph and suppose that a map w: V — C is given. 


Vertices v € V are called abstract polymers, with 
two abstract polymers connected by an edge in the 
graph G called incompatible. We shall refer to w(v) 
as to the weight of the abstract polymer v. For any 
finite W C V, we consider the induced subgraph 
G[W] of G spanned by W and define 


= 》 [ww [18] 


IcW vel 


Here the sum runs over all collections I of 
compatible abstract polymers — or, in other words, 
the sum is over all independent sets I of vertices in 
W (no two vertices in I are connected by an edge). 
The partition function Zw(w) is an entire function 
In w= [(wiv))vew. € CIV! and Zw(0)=1. Hence, it is 
nonvanishing in some neighborhood of the origin 
w= 0 and its logarithm is, on this neighbourhood, an 
analytic function yielding a convergent Taylor series 


log Zw(w) = ` aw(X)w* [19] 
Xea(W) 

Here, X(W) is the set of all multi-indices X: W 一 
(0 1,...J and w* = [T, w(v)*™). Inspecting the formula 
for aw(X) in terms of corresponding derivatives of 
log Zw(w), itis easy to show that the Taylor coefficients 
aw(X) actually do not depend on W :aw(X) — app 
X(X), where supp X — (v € V: X (v) Z 0]. Asa result, 
one is getting the existence of coefficients a(X) such that 


^ a(X)u* [20] 


XEX(W) 


log Zw(w) = 


for every finite W C V. 

The coefficients a(X) can be obtained explicitly. 
One can pass from [18] to [20] in a similar way as 
passing from [10] to [13]. The starting point is to 
replace the restriction to compatible collections of 
abstract polymers in the sum [18] by the factor 
Ile yew(l + Flv, v’)) with 


0 if v and v' are compatible 


F(v,v’) = < — 1 otherwise (v and v’ [21] 


connected by an edge from G) 


and to expand the product afterwards. The resulting 
formula is 


a(X) = (X) (-1) 405 [22] 
HcG(X) 


Here, G(X) is the graph with |X| = > |X(v)| vertices 
induced from G[supp X] by replacing each of its 
vertices v by the complete p on |X(v)| vertices 
and X! is the multifactorial X! — [T, coupp.x X(v)!. e 
sum is over all connected fe ones H c G(X 
spanned by the set of vertices of G(X) and x 
is the number of edges of the graph H. 


Cluster Expansion 533 


A useful property of the coefficients a(X) is their 
alternating sign, 


(-1) ^ *!a(X) > 0 [23] 


More important than an explicit form of the 
coefficients a(X) are the convergence criteria for the 
series [20]. One way to proceed is to find direct 
combinatorial bounds on the coefficients as expressed 
by [22]. While doing so, one has to take into account the 
cancelations arising in view of the presence of terms of 
opposite signs in [22]. Indeed, disregarding them would 
lead to a failure since, as it is easy to verify, the number 
of connected graphs on |X| vertices is bounded from 
below by 2/X/-10XI-2/2. An alternative approach is to 
prove the convergence of [20] on polydisks Dw r = 
{w:|w(v)| € R(v) for v € W} by induction in |W, 
once a proper condition on the set of radii R — (R(v); 
v € V] is formulated. The most natural for the inductive 
proof (leading in the same time to the strongest claim) 
turns out to be the Dobrushin condition: 

There exists a function r: V — [0, 1) such that, for 
each v € V 


Rv) €r(v) [[ 1-70) [24] 


VEN (v) 


Here (v) is the set of vertices v’ € V adjacent in 
graph G to the vertex v. 

Using X to denote the set of all multi-indices 
X:V—{0,1,...} with finite |X| — »;|X(v)| and 
saying that X € X is a cluster if the graph G(supp 
X) is connected, we can summarize the cluster 
expansion claim for an abstract polymer model in 
the following way: 


Theorem (Cluster expansion). There exists a func- 
tion a: X — R that is nonvanishing only on clusters, 
so that for any sequence of diameters R satisfying 
the condition |24] with a sequence (r(v)) the 
following bolds true: 


(i) For every finite W C V, and any contour weight 
w € Dw rR. one bas Zw(w) x: 0 and 


log Zw(w) = a(X)w* 
Xex(W) 


(ii) Sxea:suppxav la(X)lwl € —log(1 — r(v)). 


Notice that, we have got not only an absolute 
convergence of the Taylor series of log Zw in the closed 
polydisk Dw g, but also the bound (ii) (uniform in W) 
on the sum over all terms containing a fixed vertex v. 
Such a bound turns out to be very useful in applications 
of cluster expansions. It yields, eventually, bounds on 
various error terms, avoiding a need of an explicit 
evaluation of the number of clusters of "given size." 


The restriction to compatible collections of polymers 
can be actually relaxed. Namely, replacing [25] by 


Zw(w) = S I] w(v) 


W'cW veW’ 


U(v,v) [25] 
vc W' 


with U(v, v^) € [0, 1] (soft repulsive interaction), and 
the condition [24] by 


1 — r(v’) 


R(v) € r(v) 1- U(v,v)r(v)) 


usu 


[26] 


one can prove that the partition function Zw(tw) 
does not vanish on the polydisk Dw,r implying thus 
that the power series. of log Zw(w) converges 
absolutely on Dy. n. 

Polymers that arise in typical applications are 
geometric objects endowed with a “support” in the 
considered lattice, say ZÀ d 1, and their weights 
satisfy the condition of translation invariance. Cluster 
expansions then yield an explicit power series for the 
pressure (resp. free energy) in the thermodynamic 
limit as well as its finite-volume approximation. 

To formulate it for an abstract polymer model, we 
assume that for each x € Z/, an isomorphism 
Tx : G — G is given and that with each abstract polymer 
v € V a finite set A(v) C Z^ is associated so that 
A(r«(v)) = A(v) + x for every v € V and every x € Z4. 
For any finite W C V and any multi-index X, let 
A(W) — Uyew A(v) and A(X) — A(supp(X)). On the 
other hand, for any finite A C Z4. let W(A)— [v € 
V:A(v) C A]. Assuming also that the weight w : V —^ C 
is translation invariant — that is, w(v) =w/(7,(v)) for 
every v € V and every x € Z^ — we get an explicit 
expression for the “pressure” of abstract polymer model 
in the thermodynamic limit 


1 
p= jo ye Zwia)(w) = 3 
bos X:A(X)30 


a(X)w* 
JA(X)| 


[27] 
In addition, the finite-volume approximation can be 
explicitly evaluated, yielding 


log Zw) (ww) 


=plAl+ > 


X:A(XINAc#0 


x |MX)n A| 


Mw NE 


i28] 


Using the claim (ii), the second term can be bounded 
by const. |OA|. 


Cluster Expansions for Lattice Models 


There is a variety of applications of cluster expan- 
sions to lattice models. As noticed above, the first 
step is always to rewrite the model in terms of a 
polymer representation. 
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High-Temperature Expansions 


Let us illustrate this point in the simplest case of the Ising 
model. Its partition function in volume A C ZZ, with 
free boundary conditions and vanishing external field, is 


ZA(B) = 》 exp, > auo [29] 


Using the identity 
ef% = cosh f + excy sinh 8 [30] 
it can be rewritten in the form 


Za(B) = 2^ (cosh 8) ^J Ny “(tanh 8)? — [31] 
B 


Here, the sum runs over all subsets B of the set B(A) of 
all bonds in A (pairs of nearest-neighbor sites from A) 
such that each site is contained in an even number of 
bonds from B. Using A(B) to denote the set of sites 
contained in bonds from B, we say that B1, B2 C B(A) 
are disjoint if A(B4) ON A(B2) = 0. Splitting now B intoa 
collection B = (B, ..., B4} of its connected components 
called (high-temperature) polymers and using B(A) to 
denote the set of all polymers in A, we are getting 


Zala) = 2 (cosh 3)^9 V^ |] (tanh 8)?  (32] 


BCB(A) BEB 


with the sum running over all collections B of mutually 
disjoint polymers. This expression is exactly of the 
form [18], once we define compatibility of polymers 
by their disjointness. Introducing the weights 


w(B) = (tanh 5)" [33] 


and taking the set B(A) of all polymers in A for W, 
we get the polymer representation Z,(3)= 
2I^( cosh By BU 7 muy (10). 

To apply the cluster expansion theorem, we have to 
find a function r such that the right-hand side of [24] is 
positive and yields thus the radius of a polydisk of 
convergence. Taking r(B) = e”! with a suitable e, we get 


] @-1r(B))) 2 e! [34] 
B'EN(B) 
allowing to choose R(B)=r(B)e?!8! = (ee>)”!. 


Indeed, to verify [34] we just notice that the number 
of polymers of size n containing a fixed site is 
bounded by «” with a suitable constant «. Thus, 


DO 
Ed 
B': A(B')ox 一 
once «e is sufficiently small, and thus 


» e?! < |A(B)| € |B| [36] 
B'EN(B) 


yielding [34] (1 — t> e^" for t < 1/2). To have w € 
Dw.n (for any W) is, for R(B) — (ce2),, sufficient 
to take 3 < Bo with tanh Bo = ee? 

As a consequence, for 8 < By we can use the 
cluster expansion theorem to obtain a convergent 
power series in powers of tanh 5. In particular, 
using A(X) — Upesuppx A(B), we get the pressure by 
the explicit formula 


Bp(B) — 
log 2 +d log(cosh 8) + 


a(X) x [37] 
T3 (X^ 


for any fixed x € Z (by translation invariance of 
the contributing terms, the choice of x is irrelevant). 
The function 8p(8) is analytic on the region 8 < Po 
since it is obtained as a uniformly absolutely 
convergent series of analytic terms (tanh 5). 

This type of high-temperature cluster expansion 
can be extended to a large class of models with 
Boltzmann factor in the form exp{—}>, Ua(¢)}, 
where ó—(ó.;x € Z^) is the configuration with 
a priori on-site probability distribution v(dó,) and 
Ua, for any finite A C Z^, are the multi-site 
interactions (depending only on (¢,;x € A)). Using 
the Mayer trick we can rewrite 


apf- zx -[[a-A0) 18] 
A 


ACA 
with f4(¢)= exp{—GU,4(¢)} — 1. Expanding the 
product we will get a polymer representation with 
ae A consisting of connected collections 
= (A1,..., Aj) with weights 


A) - f I 


AEA xEU Ac 1 


v(déx) [39] 


under appropriate bounds on the interactions U4 
and for 8 small enough, using A(.A) to denote the set 
Uses, we get, 


jw(A)| € 1 [40] 

A:A(A) 2 x 
This assumption allows, as before in the case of the 
high-temperature Ising model, to apply the cluster 


expansion theorem yielding an explicit series expan- 
sion for the pressure. 


Correlations 


Cluster expansions can be applied for evaluation of 
decay of correlations. Let us consider, for the class 
of models discussed above, the expectation 


t) =F /oem TT (doe) M 


x€A 


with H4(ó)— J aca Ual) and a function WV 
depending only on variables à, on sites x from a 
finite set S C A C Zå. 

A convenient way of evaluating the expectation starts 
with introduction of the modified partition function 


Zawla) = Z4 -- aZAwy = ZA(1 + a(V),) [42] 
Clearly, 


- d log Zala) 


(y), = E 43] 


a=0 

Thus, one may get an expression for the expectation 
(V),, by forming a polymer representation of Z4, y (a) 
and isolating terms linear in o in the corresponding 
cluster expansion. For the first step, in the just cited 
high-temperature case with general multi-site inter- 
actions, we first enlarge the original set A(A) of all 
polymers in A (consisting of connected collections 
A —(A1,...,A4)) to Ws(A)=A(A) U As(A), where 
As(A) is the set of all collections (A;,...,A,) of 
polymers such that each of them intersects the set S$ 
(polymers (.A1,...,.4,) are “glued” by S into a single 
entity). Compatibility is defined as before by disjoint- 
ness; in addition, any two collections from .As(A) are 
declared to be incompatible as well as any polymer .A 
from .A(A) intersecting S is considered to be incompa- 
tible with any collection from As(A). Defining now 
Wal A) — w(.A) for A € A(A) and 


wal A) =a $ V (oe PHA) IT 


X€UAcA, U UA, AUS 


v(dó) 


[44] 


for A=(Aj,...,Ag) E As(A), we get Za ola) 


exactly in the form [18], 


LA (a) = I] Wal A) [45] 


ICWs(A) AET 
As a result, we have 


log Za v(a) = a(X)wX [46] 
XEX(Ws(A)) 


allowing easily to isolate terms linear in a: namely, 
the terms with multi-indices X with supp X N As(A) 
consisting of a single collection, say Ap, that occurs 
with multiplicity one, X(.Ao) = 1. Explicitly, using 
As A,(A) = (X € A(Ws(A)) : supp XN As(A) 

= {Ao}, X(Ao) = 1} [47] 


we get 


(9), = a(X)w* [48] 


Age As(A) XeA's 4, (A) 


It is easy to show that, for sufficiently small 8, the series 
on the right-hand side is absolutely convergent even if 
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we extend As(A) to As =U, As(A) and A's A,(A) to 
Xs A, = UA As, A,(A). As a result, we have an explicit 
expression for the limiting expectation (V) in terms of 
an absolutely convergent power series. This can be 
immediately applied to show that |(W) — (W),| decay 
exponentially in distance between $ and the comple- 
ment of A. Indeed, it suffices to find a suitable bound on 
Y^. la(X)||iw|* with the sum running over all clusters 
X reaching from the set $ to A*. To this end one does not 
need to evaluate explicitly the number of clusters of 
given “diameter” diam(X) — 5^ , X(.A) diam(A(.A)) 2 
with m > dist(S, A^). The needed estimate is actually 
already contained in the condition (ii) from the cluster 
expansion theorem. It just suffices to choose a suitable 
k and assume that 8 is small enough to assure validity 
of (40) in a stronger form, 5 ^ 4.4 45, [uw (.A)| KM^II < 1, 
yielding eventually 


la(X)| lu|^ < K-diss. A) iS] 
X : diam(X) > dist(S, A^) 
la(X)||w|* K22XC9IACADI 


X:UA e supp xA(.A)2 x 
< (SIK S, A‘) [49] 


Exponential decay of correlations (V4; V2), = 
(V4V5), = (Wi), (V2). (and the limiting (V; V2)) 
in distance between supports of V, and V, can be 
established in a similar way by isolating terms 
proportional to ala2 in the cluster expansion of 
log ZA Y, Y (Q1, a2) with 


ZA, v, (01,02) 
一 QZA(T 十 ai ( 亚 1) 3-02 (V2) 4 + 0102 (V4 V5)4) [50] 


The resulting claim can be readily generalized to one 
about the decay of the correlation (V4;...;V,) in 
terms of the shortest tree connecting supports 
81,..., 8, of the functions V4,...,V,. 


Low-Temperature Expansions 


Finally, in some models with symmetries, we can apply 
cluster expansion also at low temperatures. Let us 
illustrate it again in the case of Ising model. This time, 
we take the partition function Z4(8) with plus 
boundary conditions. First, let us define for each 
nearest-neighbor bond (x,y) its dual as the (d — 1)- 
dimensional closed unit hypercube orthogonal to the 
segment from x to y and bisecting it at its center. For a 
given configuration oA, we consider the boundary of 
the regions of constant spins consisting of the union 
O(c4) of all hypercubes that are dual to nearest- 
neighbor bonds (x, y) for which e, 4 oy. The contours 
corresponding to oA are now defined as the connected 
components of O(c4). Notice that, under the fixed 
boundary condition, there is a one-to-one correspon- 
dence between configurations c4 and sets T of 
mutually compatible (disconnected) contours in A. 
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Observing that the number of faces in O(c4) is just 
the sum of the areas |y| of the contours y € T, we 
get the polymer representation 


Zi (8) = PEA Y exp Lay m) [51] 
E 


ver 


where the sum is over all collections of disjoint 
contours in A. Here E(A) is the set of all bonds (x, y) 
with at least one endpoint x,y in A. 

The condition [24] with r(y)= yields a similar 
bound on the weights w(y)=e~"7! as in the high- 
temperature expansion. To verify it, for 8 sufficiently 
large, boils down to the evaluation of number of 
contours of size n that contain a fixed site. 

As a result, we can employ the cluster expansion 
theorem to get 


log Z£(8) = BIEMA + M. 


X:XEX(C(A)) 


a(X)w* [52] 


with an explicit formula for the limit 


a(X) uy 
|A(X)| 


Bp(B)=Bd+ X 


X:A(X)30 


[53] 


Here, A(X) is the set of sites attached to contours 
from supp X, 


A(X) = Uyesupp xA(7Y) [54] 
with 
A(y) = (x € Z4 | such that dist(x, y) < 1/2} [55] 


As a consequence of the fact that [53] is, for large 
B, an absolutely convergent sum of analytic terms 
a(Xyw* —a(X)e ? 22. * ^! (considered as functions 
of 8), the function p(B) is, for large 8, analytic in 5. 

The fact that one can explicitly express the 
difference log Z4(8) — |A|8p(8) (cf. [28]) found 
numerous applications in situations where one 
needs an accurate evaluation of the influence of the 
boundary of the region A on the partition function. 
One such example is a study of microscopic 
behavior of interfaces. The main idea is to use the 
explicit expression in the form 


Zi (8) 
|JA(X)N Al 
=exp{Bp(B)|A (X)w* —~—__— 
exp(Ap(8)|A]) exp i x. TAX 
-exp(Ap(8)A)  [[ (+f) [56] 


X:A(X)MAc 40 
Noticing that 


" |A(X) n A| 
fx = exp (Qu^ on] 一 1 


does not vanish only if A(X) A Z 0, we can expand 
the product to obtain “decorations” of the boundary 
ON by clusters fx. In the case of interface these clusters 
can be incorporated into the weight of interface, while 
on a fixed boundary they yield a *wall free energy." 
The possibility of the (low-temperature) polymer 
representation of the partition function in terms of 
contours is based on the 十 — — symmetry of the 
Ising model. In absence of such a symmetry, cluster 
expansions can still be used, but in the framework of 
Pirogov-Sinai theory (see Pirogov-Sinai Theory). 


Bibliographical Notes 


Cluster expansions originated from the works of Ursell, 
Yvon, Mayer, and others and were first studied in terms 
of formal power series. The combinatorial and enu- 
meration problems considered in this framework were 
summarized in Uhlenbeck and Ford (1962). For related 
topics in modern language, see Bergeron et al. (1998). 
The convergence results for Mayer and virial expansions 
for dilute gas were first proved in the works of Penrose, 
Lebowitz, Groenveld, and Ruelle (see Ruelle (1969) for 
a detailed survey). General polymer models on lattice 
were discussed by Gruber and Kunz (1971) (see also 
Simon (1993) for discussion of high-temperature and 
low-temperature cluster expansions of lattice models). 
Abstract polymer models were introduced in Kotecky 
and Preiss (1986). An elegant proof of a general claim 
presented by Dobrushin (1996) was further extended 
and summarized by Scott and Sokal (2005). We follow 
their reformulation of the Dobrushin condition. Cluster 
expansions with a view on applications in quantum field 
theory are reviewed in Brydges (1986). 


See also: Phase Transitions in Continuous Systems; 
Pirogov-Sinai Theory; Wulff Droplets. 


Further Reading 


Bergeron F, Labelle G, and Leroux P (1998) Combinatorial 
Species and Tree-Like Structures, Coll. Encyclopaedia of 
Mathematics and Its Applications, vol. 67. Cambridge, MA: 
Cambridge University Press. 

Brydges DC (1986) A short course on cluster expansions. In: 
Osterwalder K and Stora R (eds.) Critical Pbenomena, Random 
Systems, Gauge Theories, pp. 129-183. Les Houches, Session 
XLIII, 1984. Amsterdam/New York: Elsevier. 

Dobrushin RL (1996) Estimates of semi-invariants for the Ising 
model at low temperatures. In: Dobrushin RL, Minlos RA, 
Shukin MA, and Vershik AM (eds.) Topics in Statistical and 
Theoretical Physics, pp. 59-81. Providence, RI: American 
Mathematical Society. 

Gruber C and Kunz H (1971) General properties of polymer 
systems. Communications Mathematical Physics 22: 133-161. 

Kotecky R and Preiss D (1986) Cluster expansion for abstract polymer 
models. Communications in Mathematical Physics 103: 491—498. 


Ruelle D (1969) Statistical Mecbanics: Rigorous Results, The 
Mathematical Physics Monograph Series. Reading, MA: 
Benjamin. 

Scott AD and Sokal AD (2005) The repulsive lattice gas, the 
independent-set polynomial, and the Lovász local lemma. 
Journal of Statistical Physics 118: 1151-1261. 


| Coherent States 
| S T Ali, Concordia University, Montreal, QC, Canada 


E 


| © 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Very generally, a family of coherent states is a set of 
continuously labeled quantum states, with specific 
mathematical and physical properties, in terms 
of which arbitrary quantum states can be expressed 
as linear superpositions. Since coherent states are 
continuously labeled, they form overcomplete 
sets of vectors in the Hilbert space of states. 
Originally these states were introduced into physics 
by Schródinger (1926), as a family of quantum 
states in terms of which the transition from quantum 
to classical mechanics could be conveniently studied. 
These states have the minimal uncertainty property, 
in the sense that they saturate the Heisenberg 
uncertainty relations. The name coherent state was 
applied when these states were rediscovered in the 
context of quantum optical radiation by Glauber, 
Klauder, and Sudarshan. It was demonstrated that in 
these states the correlation functions of the quantum 
optical field factorize as they do in classical optics, 
so that the optical field has a near-classical behavior; 
with the optical beam being coherent. In this article, 
we shall refer to these originally studied coherent 
states as canonical coherent states (CCS). 

The canonical coherent states, apart from their 
use in quantum optics, have also been found to be 
extremely useful in computations in atomic and 
molecular physics, in quantum statistical mechanics, 
and in certain areas of mathematics and mathema- 
tical physics, including harmonic analysis, symplec- 
tic geometry, and quantization theory. Their wide 
applicability has prompted the search for other 
families of states sharing similar mathematical and 
physical properties. These other families of states are 
usually called generalized coherent states, even when 
there is no link to optical coherence in such studies. 


Some Properties of CCS 


In addition to the minimal uncertainty property, the 
canonical coherent states have a number of analytical 
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and group-theoretical properties which are taken as 
starting points in looking for generalizations. We 
now define the canonical coherent states mathemati- 
cally and enumerate a few of these properties. 
Suppose that the vectors |0),|1),...,|”),..., cor- 
respond to quantum states of 0,1,...,7,..., exci- 
tons, respectively. The Hilbert space of these states, 
in which they form an orthonormal basis, is often 
known as Fock space. The canonical coherent states 
are then defined in terms of this basis, for each 
complex number z, by the analytic expansion: 


ac NE D 1 
|z) | ) [1] 


The states |z) are normalized to unity: (z|z)— 1. 
They satisfy the formal eigenvalue equation 


a|z) = z|z) [2] 


where a is the annihilation operator for excitons, which 
acts on the basis vectors (Fock states) |n) as follows: 


ajn) = vn|n — 1) i3] 
Its adjoint a! has the action 
a! |n) = Vn + 1|n + 1) [4] 
and 
la,a'| — aa! = a'a =I [5] 


I being the identity operator on Fock space. 
Introducing the self-adjoint operators O and P, of 
position and momentum, respectively, 


i 

a—a 

P = —— 6 
W2 " 

it is possible to demonstrate the minimal uncertainty 

property referred to above (we take 5 — 1): 


(AQ)(AP) = 5 [7] 


where for any observable A, 


(AA) = [(zlA?lz) — elal) 


is its dispersion in the state |z). 
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One can also prove the resolution of the identity, 


feet =1 8 
C T 


where z= (1/V2)(qg — ip) has been written in terms 
of its real and imaginary parts (1//2)q and 
(1/V2)p, respectively. The above operator integral 
is to be understood in the weak sense, as will be 
explained later. Equation [8] incorporates the 
mathematical fact that the set of vectors |z) is 
overcomplete in the Hilbert space. Indeed, using [8] 
any vector |) in the Hilbert space can be written as 
a linear (integral) superposition of these states: 


o= [ wgio 


where V is the component function, V(z) = (@|z). 
Thus, the coherent states |z) form a continuously 
labeled total set of vectors in the Hilbert space and 
since this space is separable, they are an over- 
complete set. 

Analytic properties of the vectors |z) emerge when 
the scalar product (ó|z) is taken with respect to an 
arbitrary vector |ó) in Fock space. From [1] it is 
clear that 


F(z) = ($lz) = el "^f (z) 


where f is an entire analytic function in the complex 
variable z. Moreover, the mapping $-f is an 
isometric embedding of the Fock space onto the 
Hilbert space of analytic functions, with respect to 
the norm 


Ifl = | i If E] ^ [9] 


defined by the measure dji(z, z) =(1/2m)e*! dq dp. 
Group-theoretical properties of the CCS can be 
demonstrated by noting that 


(a)" 


vin! 


using which [1] can be recast into the form 


|0) and a|0) = 0 


In) = 


Iz) = e- "2e 19) = U()]0) 


i [10] 
U(z) =e” — za 


The vectors |z) and the unitary operator U(z) can be 
reexpressed in terms of the real variables q, p and the 
operators Q, P as 


iz) = |q, b) = U(a.p)l0) 


U(q,p) = eo " 


The operators U(q,p) realize a (projective) unitary, 
irreducible representation of the Weyl-Heisenberg 
group, which is the group whose Lie algebra has the 
generators O, P, and I, obeying the commutation 
relations [O, P] — il. The existence of the resolution 
of the identity [8] is the statement of the fact that 
this representation is square integrable (a notion 
which will be elaborated upon in the section “Some 
examples") which gives us the next paradigm for 
building coherent states, namely by the action, on a 
fixed vector, of the unitary operators of a square- 
integrable representation of a locally compact 
group. 

The above range of properties, which are enjoyed 
by the CCS, cannot all be expected to hold when 
looking for generalizations. It then becomes neces- 
sary to adopt one or other of these properties as the 
starting point and to proceed from there. In so 
doing, it is best first to set down a general definition 
of coherent states, involving a minimal mathema- 
tical structure. Motivated more by possible applica- 
tions to physics, we do this in the following section. 


General Definition 


Let f) be an abstract, separable Hilbert space over 
the complexes, X a locally compact space and dv a 
measure on X. Let |x, i) be a family of vectors in 9, 
defined for each x in X and i= 1,2,3,..., N, where 
N is usually a finite integer, although it could also 
be infinite. We assume that this set of vectors 
possesses the following properties: 


1. For each i, the mapping x |x,;) is weakly 
continuous, that is, for each vector |o) in $, the 
function V;(x)— (x,i|ó) is continuous (in the 
topology of X). 

2. For each x in X, the vectors i i= 1,2, .. ., N, 
are linearly independent. 

3. The resolution of the identity 


N 
» 人 be, Dix; dlde) = Tg 12] 
j=] 


holds in the weak sense on the Hilbert space f$, 
that is, for any two vectors |@),|w) in $» the 
following equality holds: 


N 
> J (és e idv) = (9) 


A set of vectors |x,;) satisfying the above three 
properties is called a family of generalized vector 
coherent states. In case N — 1, the set is called a family 
of generalized coherent states. Sometimes the resolu- 
tion of the identity condition is replaced by a weaker 


condition, with the vectors |x, i) simply forming a total 
set in f) and the functions Fi(x)= (x,i\@), as |o) runs 
through $, forming a reproducing kernel Hilbert 
space. Alternatively, the identity on the right-hand 
side of [12] could also be replaced by a bounded, 
positive operator 了 with bounded inverse. In this case, 
the term frame is also used for the family of general- 
ized coherent states. For physical applications, how- 
ever, the resolution of the identity condition is always 
assumed to hold, although the measure dv could be of 
a very general nature (possibly also singular). The 
objective in all these cases is to ensure that an arbitrary 
vector |ó) be expressible as a linear (integral) 
combination of these vectors. Indeed, [12] is immedi- 
ately seen to imply that 


= ^ 
=> / V; (x)]x, i)dv(x) [13] 


where W;(x) = (x, i|ó). 

Associated to a family of generalized coherent 
states on a Hilbert space $, there is an intrinsic 
isomorphism between this space and a Hilbert space 
of (in general, vector valued) continuous functions 
over X. Using this isomorphism, it is always possible 
to look upon coherent states as a family of 
continuous functions which are square integrable 
with respect to the measure dv. To demonstrate this, 
we note that, in view of [12], for each vector |ó) in 
H, the vector-valued function (x) on x, with 
components W;(x) = (x,1|9), 1— 1,2,..., N, satisfies 
the norm condition 


E 2 2 
L 人 WW; a) dv(x) = liol 


This means that the set of vectors V, as |ó) runs 
through $, is a closed subspace of the Hilbert space 
Lew (X, dv) of N-vector-valued functions on x. Let us 
denote this subspace by 9x and note that this space 
is a reproducing kernel Hilbert space with a matrix- 


valued kernel K(x, y) having matrix elements 
K(x, y); = (309.4, = NN [14] 


and enjoying the properties 


K(x,x); > 0 [15] 


i 


and 


N 
» [ Kx, daK(e dete) = K(x,y); [16] 
ł=1 ` 


If é, i=1,2,...,N, are the vectors constituting the 
canonical basis of C^, then for each x in X and 
i— 1,2,..., N, the vector-valued function & on X, 
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defined by £ (y) 2 K(y,x)e', is the image in $k of 
the generalized vector coherent state |x, i), under the 
above-mentioned isometry. The vectors & span 
the space 9x and for an arbitrary element ¥ of this 
Hilbert space, the reproducing property [16] of the 
kernel implies the relation 


人 K(x, yW(y)dw(y) = P(x) [17 


Conversely, given any reproducing kernel Hilbert 
space, with a kernel satisfying the relations [15] and 
[16], generalized coherent states can be constructed 
as above in terms of this kernel. Mathematically, 
therefore, generalized coherent states are just the set 
of vectors naturally defined by the kernel in a 
reproducing kernel Hilbert space. 


Some Examples 


We present in this section. some of the more 
commonly used types of coherent states, as illustra- 
tions of the general structure given above. 

A large class of generalizations of the canonical 
coherent states [1] is obtained by a simple modifica- 
tion of their analytic structure. Let x1 < x2 € --- € 
Xn € *:- be an infinite sequence of positive numbers 
(x1 #0). Define x,! —x1x2?--- x, and by convention 
set xo! — 1. In the same Fock space in which the CCS 
were described, we now define the related deformed 
or nonlinear coherent states via the analytic 
expansion 


2 M z” 
|z) = N (Izl^) 2 rui [18] 


The normalization factor A(|z|?) is chosen so that 
(z|z)=1. These generalized coherent states are 
overcomplete in the Fock space and satisfy a 
resolution of the identity of the type 


Í le) e (leP)dw(z,z) = 1 19] 


D being an open disk in the complex plane of radius 
L, the radius of convergence of the series 
SO o (2"/Vxn!). (In the case of the CCS, L= o0.) 
The measure dv is generically of the form d8 dA(r) 
(for z= re?^), where dA is related to the x,! through 
the moment condition 


x,! L 
x. Í r^" dA(r), me Boll seus [20] 
0 


2m — 
This means that once the quantities x,! are specified, 
the measure dà is to be determined by solving the 
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moment problem [20], which of course may not 
always have a solution. This puts a constraint on the 
type of sequences (x,] which may be used in the 
construction. 

Once again, we see that for an arbitrary vector |ó) 
in the Fock space, the function F(z) — (9 |z}, of the 
complex variable z, is of the form F(z)= 
N (Iz)? f(z), where f is an analytic function on 
the domain D. The reproducing kernel associated to 
these coherent states is 


K(z,z) = (zl) 
= [weweh] EE pr 
n=0 


Xn! 


By analogy with [2], one can define a generalized 
annihilation operator A by its action on the vectors |z), 


Alz) = ziz) [22] 


and its adjoint operator At. These act on the Fock 
states |n) as follows: 


Aln) = vn + 1) 
Atln) = Vailn + 1) 


Depending on the exact values of the quantities X», 
these two operators, together with the identity I and 
all their commutators, could generate a wide range 
of algebras including various deformed quantum 
algebras. The term nonlinear, as often applied to 
these generalized coherent states, comes again from 
quantum optics, where many such families of states 
are used in studying the interaction between the 
radiation field and atoms, and the strength of the 
interaction itself depends on the frequency of 
radiation. Of course, these coherent states will not 
in general have either the group-theoretical or the 
minimal uncertainty properties of the CCS. 

The following is an example of generalized 
coherent states of the above type, built over the 
unit disk, D={z € C||z| < 1}: on the Fock space, 
we define the states 


oo - 1/2 
2) = (1 -Pry €x. zn) 


where k= 1, 3/2, 2, 5/2,..., and 


[23] 


r=|e| 124 


Tr(a 4- m) 


— a(a 4- 1)(a 4- 2) --- (a 4- m — 1) 
Comparing [24] with [18] we see that x, — z/(2« + 


n — 1) so that lim, ,wo x, = 1. Thus, the infinite sum 
is convergent for any z lying in the unit disk. These 


generalized coherent states arise from representa- 
tions of the group SU(1, 1) belonging to the discrete 
series, each irreducible representation being labeled 
by a specific value of the index «. The associated 
Hilbert space of functions, analytic on the unit disk, 
is a subspace of L^(D, dj), with 


(1 u my 
T 


du,(z,z) = (2K — 1) r dr dé 


£&-—]pc 


which can be obtained by solving the moment 
problem [20]. The resolution of the identity satisfied 
by these states is 


25 — 1 人 r dr dé 
"n AOGE t [25] 


The associated generalized creation and annihilation 
operators are 


i26] 


so that, clearly, [A, A'] Z I. 

Operators A and A! of the general type defined in 
[23] are also known as ladder operators. When such 
operators appear as generators of representations of 
Lie algebras, their eigenvectors (see [22]) are usually 
called Barut-Girardello coherent states. As an example, 
the representation of the Lie algebra of SU(1,1) on the 
Fock space is generated by the three operators K+, K_, 
and K3, which satisfy the commutation relations 


Kl=, [KK 2E BA 


They act on the vectors |n) as follows: 


K. |n) = Jn(2x 4- n — 1)|n — 1) 


K, = Ki [28] 


Ks|n) = (K + n)|n) 

Thus, K_|0) — 0 and 
In) = K*0) 
n\(2k),, 


The Barut-Girardello coherent states |z) are now 
defined as the formal eigenvectors of the ladder 
operator K_: 


KIO [=a zeC [29] 
They have the analytic form 
TEM et a 
V Lx-1(2|z|) £29 V n!(2& +n — 1)! 
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where I,,(x) is the order-v modified Bessel function 
of the first kind. These coherent states satisfy the 
resolution of the identity, 


-| Iz) (z|Ko, 1(2r)I2, 1(2r)rdrd8 = I 

7 Jc [31] 
z= re 

where again, K,(x) is the order-v modified Bessel 

function of the second kind. 

A nonanalytic extension of the expression [18] is 
often used to define generalized coherent states 
associated to physical Hamiltonians having pure 
point spectra. These coherent states, known as 
Gazeau-Klauder coherent states, are labeled by 
action-angle variables. Suppose that we are given 
the physical Hamiltonian H —77. o E,|n) (n|, with 
Eo — 0, that is, it has the energy eigenvalues E, and 
eigenvectors |n), which we assume to form an 
orthonormal basis for the Hilbert space of states f). 
Let us write the eigenvalues as E, — we, by introdu- 
cing a sequence of dimensionless quantities (e,] 
ordered as: 0 = eg < eq < €? < ---. Then, for all J > 0 
and y € R, the Gazeau-Klauder coherent states are 
defined as 


I. 3) yia or 


where again N is a normalization factor, which 
turns out to be dependent on J only. These coherent 
states satisfy the temporal stability condition 


a c= 32 


e UJ n) = [Ja +t) [33] 
and the action identity 
ANHI, 1) § = uw] [34] 


While these generalized coherent states do form an 
overcomplete set in 9, the resolution of the identity 
is generally not given by an integral relation of the 
type [12]. 

For the second set of examples of generalized 
coherent states, we take the group-theoretical structure 
of the CCS as the point of departure. Let G be a 
locally compact group and suppose that it has a 
continuous, irreducible representation on a Hilbert 
space § by unitary operators U(g),g € G. This 
representation is called square integrable if there exists 
a nonzero vector |V) in § for which the integral 


c) = f \(#|U(g)W) 2 dul) [35] 


converges. Here dy is a Haar measure of G, which 
for definiteness, we take to be the left-invariant 
measure. (The value of the above integral is 


independent of whether the left- or the right-invariant 
measure is used, so we could just as well have used 
the right-invariant measure.) A vector |v), satisfying 
[35], is said to be admissible, and it can be shown 
that the existence of one such vector guarantees the 
existence of an entire dense set of such vectors in §. 
Moreover, if the group G is unimodular, that is, if the 
left- and the right-invariant measures coincide, then 
the existence of one admissible vector implies that 
every vector in § is admissible. Given a square- 
integrable representation and an admissible vector 
lw), let us define the vectors 


1 
c(v) 


for all g in the group G. These vectors are to be seen 
as the analogs of the canonical coherent states [11], 
written there in terms of the representation of the 
Weyl-Heisenberg group. Next, it can be shown that 
the resolution of the identity 


a= U(g)|v) [36] 


[ le)(eldu(e) = Is [37] 


holds on §. Thus, the vectors |g) constitute a family 
of generalized coherent states. The functions 
F(g)—(gló) for all vectors |ó) in § are square 
integrable with respect to the measure dy and the 
set of such functions, which in fact are continuous in 
the topology of G, forms a closed subspace of 
L?(G,dy). Furthermore, the mapping pF is a 
linear isometry between § and L^(G, dyu) and under 
this isometry the representation U gets mapped to a 
subrepresentation of the left regular representation 
of G on L^(G, dy). 

A typical example of the above construction is 
provided by the affine group, Gag. This is the group 
of all 2 x 2 matrices of the type 


= (6 1) [38] 


a and b being real numbers with a #0. We shall 
also write g=(b,a). This group is nonunimodular, 
with the left-invariant measure being given by 
dyi(b, a) — (1/4?) db da. (The right-invariant measure 
is (1/a) dbda.) The affine group has a unitary 
irreducible representation on the Hilbert space 
L'(R,dx). Vectors in L^(R,dx) are measurable 
functions ó(x) of the real variable x and the 
(unitary) operators U(b,a) of this representation 
act on them in the manner 

(^ *) i39] 


(U(b, a)ó)(x) = E 
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If v is a function in L*(R, dx) such that its Fourier 
transform w satisfies the condition 


|v()p 
Í p dk < o0 140) 


then it can be shown to be an admissible vector, that is, 


av) =f (UG e) LS < oc 


ff 


Thus, following the general construction outlined 
above, the vectors 


|b,a) = U(b ajy, (b,a) € Gag [41] 


1 
v ely) 


define a family of generalized coherent states and 
one has the resolution of the identity 


db da 
| bajba —* = 
J Gag a 


I [42] 


on L?(R, dx). 

In the signal-analysis literature a vector satisfying 
the admissibility condition [40] is called a mother 
wavelet and the generalized coherent states [41] are 
called wavelets. Signals are then identified with 
vectors |¢) in L^(R, dx) and the function 


F(b,a) = (b, a\@) [43] 


is called the continuous wavelet transform of the 
signal ġ. 

There exist alternative ways of constructing 
generalized coherent states using group representa- 
tions. For example, the Perelomov method is based 
on the observation that the vector |0), appearing in 
the construction of the canonical coherent states in 
[10] and [11] using the representation of the Weyl- 
Heisenberg group, is invariant up to a phase, under 
the action of its center. Consequently, the coherent 
states |z), as written in [10], are labeled, not by 
elements of the group itself, but only by the points in 
the quotient space of the group by its (central) phase 
subgroup. Generally, let G be a locally compact 
group and U a unitary irreducible representation of 
it on the Hilbert space 9. We do not assume U to be 
square integrable. We fix a vector |) in $, of unit 
norm and denote by H the subgroup of G consisting 
of all elements b for which 


U(b)|v) = e^) [44] 


where w is a real-valued function of h. Let X = G/H 
be the left-coset space and x an arbitrary element in X. 


Choosing a coset representative g(x) € G, for each 
coset x, we define the vectors 


ix) = U(gix))lv) [45] 


in §. The dependence of these vectors on the specific 
choice of the coset representative g(x), is only 
through a phase. Thus, if instead of g(x) we took a 
different representative g(x)’ € G for the same coset 
x, then since g(x)’ = g(x)b for some b € H, in view of 
[44] we would have U(g(x))|w) =e |x). Hence, 
quantum mechanically, both |x) and U(g(x))|v) 
represent the same physical state and in particular, 
the projection operator |x)(x| depends only on the 
coset. Vectors |x), defined in this manner, are called 
Gilmore-Perelomov -coherent states. Since U is 
assumed to be irreducible, the set of all these vectors 
as x runs through G/H is dense in §. In this 
definition of generalized coherent states, no resolu- 
tion of the identity is postulated. However, if X 
carries an invariant measure, under the natural 
action of G, and if the formal operator B defined as 


B= L x) (x| du(x) 


is bounded, then it is necessarily a multiple of the 
identity and a resolution of the identity is again 
retrieved. 

The Perelomov construction can be used to define 
coherent states for any locally compact group. On 
the other hand, there exist other constructions of 
generalized coherent states, using group representa- 
tions, which generalize the notion of square integr- 
ability to homogeneous spaces of the group. Briefly, 
in this approach one starts with a unitary irreducible 
representation U and attempts to find a vector |i), a 
subgroup H and a section c: G/H — G such that 


| ldu) = T 46 
G/H 


where |x) —- U(oc(x)) v), T is a bounded, positive 

operator with bounded inverse and dy is a quasi- 
invariant measure on X — G/H. lt is not assumed 
that |W) be invariant up to a phase under the action 
of H and clearly, the best situation is when T is a 
multiple of the identity. Although somewhat techni- 
cal, this general construction is of enormous 
versatility for semidirect product groups of the type 
R'"»K, where K is a closed subgroup of GL(z, R). 
Thus, it is useful for many physically important 
groups, such as the Poincaré or the Euclidean group, 
which do not have square-integrable representations 
in the sense of the earlier definition (see eqn [35]). 
The integral condition [46] ensures that any vector 
Id) in f) can be written in terms of the |x). Indeed, it 


is easy to see that one has the integral representation 
of a vector, 


o = 人 W(x) |x) dp(x) 
U(x) = (x|T^! 9) 


in terms of the generalized coherent states. 

The canonical coherent states satisfy the minimal 
uncertainty relation [7]. It is possible to build 
families of coherent states by generalizing from this 
condition. To do this, one typically starts with two 
self-adjoint generators in the Lie algebra of a 
particular group representation and then looks for 
appropriate eigenvectors of a complex combination 
of these two generators. For two self-adjoint 
operators B and C on a Hilbert space 9, satisfying 
the commutation relation [B,C]=iD and any 
normalized vector @ in §, one can prove the 
Heisenberg uncertainty relation 

2 
(AB (AC)! > 2) 47 
where (X) =(¢|X¢) and (AX)* = (X2?) — (X)*, for 
any operator X on §. More generally, one can prove 
the Schródinger-Robertson uncertainty relation 


(ABP(AC* 22 [y cy] — 18] 


where (F)=(BC+ CB) —2(B)(C) measures the 
correlation between B and C in the state 4. 
If (F)=0, the above relation reduces to the 
Heisenberg uncertainty relation. On the other 
hand, if (D) —0, the Heisenberg uncertainty rela- 
tions become redundant. Suppose now that B and 
C are two self-adjoint elements of the Lie algebra in 
the unitary irreducible representation of a Lie group 
and we look for states |ó) which minimize the 
uncertainty relation [48], that is, for which 
the equality holds. It turns out that such states 
can be found by considering the linear combination 
B + iAC, for a fixed complex number A, and solving 
the formal eigenvalue equation 


[B + iAC]|z, A) = z|z, A) 

. | [49] 
with z= (B) +iX(C) 
Solutions to this equation for which |A|=1 are 
called squeezed states, since in this case AB Æ AC. 
Generally, the states |z, à) are known as intelligent 
states. As an example, for the operators O and P in 
[6], for which one has 


(AQ) (APY > 3 [14- (F)? 


Ap 
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taking the combination O+ iAP, one obtains the 
minimal uncertainty states, 


lz, A) = N (z, A) ema"? ela/V2)(14)4' 19) [50] 


N(z,A) being a normalization constant and 
w =(1—A)/(1+A). The case A= —1 does not lead 
to any solutions, while A— 1 gives the canonical 
coherent states [10]. For real A Z 1 the above states 
are the well-known squeezed states of quantum 
Optics. 

Our final example is that of a family of vector 
coherent states, which will be obtained essentially 
by replacing the complex variable z in [18] by a 
matrix variable. We choose the domain Q= C^? 
(all 2 x 2 complex matrices), equipped with the 
measure 


—tr[33] 2 
dv(3, 3) =A a li dx,; ^ dyp; 
kj=1 


where 3 is an element of €? and Zg; = Xp; + iy, are its 
entries. One can then prove the matrix orthogon- 
ality relation 


[ 3*3 dy(3, 3") 


-5 | 33 dv(3, 31s 
= bk, ki-0,12,:9. [SH] 


1? being the 2 x 2 identity matrix and 


o (k + 3)! 
ae) = 2(k + 1)(k +2) [52] 
k=1,2,3,..., b(0)=1 


Consider the Hilbert space $= L*,(Q, dv) of square 
integrable, two-component vector-valued functions 
on €) and in it consider the vectors ni. 21.2, 
k=0,1,2,...,00, defined by the C?-valued 


functions, 


i (3) = + shi [53] 


v b(R) 


where the vectors x',i= 1,2, form an orthonormal 
basis of C^. By virtue of [51], the vectors |¥;) 
constitute an orthonormal set in $), that is, 


(PEP) = = Spe 6; 


Denote by $ the Hilbert subspace of $ generated 
by this set of vectors, This can be shown to be a 
reproducing kernel Hilbert space of analytic 
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functions in the variable 3', with the matrix valued 
kernel K:N x QN C2*?: 


i=1 0 
2 cc gik 3 
» [54] 
2:2 D 


Vector coherent states in 9x are then naturally 
associated to this kernel and are given by 
I 


E a V 
3,4) =) Y tvi 
|3, 1) 2.2. BB | k) (55] 
that is, |3,i)(3") = K(3, 3) 


for i= 1,2 and all 3 in Q. They satisfy the resolution 
of the identity 


2 
Sf BABAI = [56 


The expression for the |3,7) in [55], involving the 
sum, should be compared to [18], of which it is a 
direct analog. 


Some Applications of Coherent States 


Generalized coherent states have many applications 
in physics, signal analysis, and mathematics, of 
which we mention a few here. As an example of 
an application of deformed coherent states, we take 


n =e 1/2 
q" 一 4 | 
"We Jiu) S NUIT, 57 

PER 57 
in the definition of these states in [18]. It is then easy 
to see that the operators A and At, defined in [23], 
satisfy the q-deformed commutation relation 


AA! — qAtA = q-N [58] 


where N is the usual number operator, which acts 
on the Fock states as N|n) — zn). Clearly, in the 
limit as q — 1, these q-deformed coherent states go 
over to the canonical coherent states, with the 
operators A and A! becoming the usual creation 
and annihilation operators a and at, respectively. 
The operators A and A! and the commutation 
relation [58] describe a system of q-deformed 
oscillators, which have been used to describe, for 
example, the vibrations of polyatomic molecules. 
The potential energy between the atoms of such 
a molecule has anharmonic terms, leading to 
a deformation of the usual oscillator algebra, 
generated by the operators a and a’. 


As already mentioned, generalized coherent states 
are widely used in signal analysis. The wavelet 
transform F(b,a) — (b,a|ó), introduced in [43], is a 
time-frequency transform, in which the parameter b 
is identified with time and 1/a with frequency. 
Wavelet transforms are used extensively to analyze, 
encode, and reconstruct signals arising in many 
different branches of physics, engineering, seismo- 
graphy, electronic data processing, etc. Similarly, the 
canonical coherent states, as written in [11], give 
rise to the transform F(q,p) = (q,p | ó). Again, if q is 
interpreted as time and p as frequency, then this is 
just the windowed Fourier transform, also used 
extensively in signal processing. More general 
wavelets, from higher-dimensional affine groups, 
are used to analyze higher-dimensional signals, 
while wavelet like transforms from other groups 
have been used to study signals exhibiting different 
geometries. In particular, wavelet transforms from 
spherical geometries have been applied to the study 
of brain signals and to astrophysical data. 

Our final example is taken from quantization 
theory. A quantization technique is a method for 
performing the transition from a given classical 
mechanical system to its quantum counterpart. 
Many methods have been developed to accomplish 
this and the use of coherent states is one of them. 
Suppose that we are given a family of coherent 
states |x) in a Hilbert space $, where the set X from 
which x is taken is a classical phase space. This 
means that X is a symplectic manifold with an 
associated 2-form w, which defines a Poisson 
bracket on the set of observables of the classical 
system, which are real-valued functions on X. There 
is a natural measure du, defined on X by the 2-form 
w. Let us assume that the coherent states |x) satisfy a 
resolution on the identity with respect to this 
measure: 


f lx) (xldw(x) = Is 
X 


In this case, the coherent states may be used to 
quantize the observables of the classical system in 
the following way: let f be a real-valued function on 
X, representing a classical observable and suppose 
that the formal operator 


f= J f (2c) oe) ldla) 159] 
A 


is well defined as a self-adjoint operator on $). Then 
we may take the operator f to be the quantized 
observable corresponding to the classical observable 
f. Suppose that we have two such operators, f and g, 


corresponding to the two classical observables f and 
g, Which have the Poisson bracket (f, g], defined via 
the 2-form w. We then check if the quantization 
condition 


"~ 


(fel = Fa 60) 


where h is Planck’s constant, is satisfied. Generally 
this will be the case for a certain number of classical 
observables. This method of quantization has been 
most successfully used for manifolds X which have a 
(complex) Kahler structure. Over such a manifold, 
one can define a Hilbert space of analytic functions, 
which has a reproducing kernel and hence a 
naturally associated set of coherent states. As a 
specific example, we take the case of canonical 
coherent states [11]. We can identify the complex 
plane C with the phase space R? of a free classical 
particle having a single degree of freedom. The 
measure dw in this case is just (1/27)dq dp. If we 
now quantize the classical observables f(q,p)-—q 
and f(q,p) — p, of position and momentum, respec- 
tively, using the canonical coherent states, we obtain 
the two operators 


" dqdp 
o= | ,alap)(a.p - - 


- dq dp 
P= [Papas pl Se 
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Introduction 


The origins of cohomology theory are found in 
topology and algebra at the beginning of the last 
century but since then it has become a tool of nearly 
every branch of mathematics. It’s a way of life! 
Naturally, this article can only give a glimpse at the 
rich subject. We take here the point of view of 
algebraic topology and discuss only the cohomology 
of spaces. 

Cohomology reflects the global properties of a 
manifold, or more generally of a topological space. 
It has two crucial properties: it only depends on the 
homotopy type of the space and is determined by 
local data. The latter property makes it in general 
computable. 
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It can be verified that these two operators satisfy the 
canonical commutation relations [O,P]=ils, as 
required. 


See also; Solitons and Kac-Moody Lie Algebras; 
Wavelets: Mathematical Theory. 
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To illustrate the interplay between the local and 
global structure, consider the Euler characteristic of 
a compact manifold; as will be explained below, 
cohomology is a refinement of the Euler character- 
istic. For simplicity, assume that the manifold M is a 
surface and that we have chosen a way of dividing 
the surface into triangles. The Euler characteristic is 
then defined to be 


x(M) = F—E+V 


where F denotes the number of faces, E the number 
of edges, and V the number of vertices in the 
triangulation. Remarkably, this number does not 
depend on the triangulation. Yet, this simple, easy to 
compute number can already distinguish the differ- 
ent types of closed, oriented surfaces: for the sphere 
we have y —2, the torus x =0, and in general for 
any surface M, of genus g 


x(M,) —2-2g 
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The Euler characteristic also tells us something 
about the geometry and analysis of the manifold. For 
example, the total curvature of a surface is equal to its 
Euler characteristic. This is the Gauss-Bonnet theo- 
rem and an analogous result holds in higher dimen- 
sions. Another striking result is the Poincaré-Hopf 
theorem which equates the Euler characteristic with 
the total index of a vector field and thus gives strong 
restrictions on what kind of vector fields can exist on 
a manifold. This interplay between global analysis 
and topology has been one of the most exciting and 
fruitful research areas and is most powerfully 
expressed in the celebrated Atiyah-Singer index 
theorem, which determines the analytic index of an 
elliptic operator, such as the Dirac operator on a spin 
manifold, in terms of cohomology classes. 


Chain Complexes and Homology 


There are several different geometric definitions of 
the cohomology of a topological space. All share 
some basic algebraic structure which we will explain 
first. 

A “chain complex” (C,, 0.) 


0 
— Co [1] 


is a collection of vector spaces (or R-modules more 
generally) C;,i>0, and linear maps (R-module 
maps) O;: C; + Ci_1 with the property that for all i 


ð; © il 一 =0 [2] 


The scalar fields one tends to consider are the 
rationals Q, reals R, complex numbers C, or a 
primary field Z,, while the most important ring R is 
the ring of integers Z though we will also consider 
localizations such as Z[1/p], which has the effect of 
suppressing any p-primary torsion information. 

Of particular interest are the elements in C; that are 
mapped to zero by Ó;, the i-dimensional “cycles,” and 
those that are in the image of 9;,1, the i-dimensional 
*boundaries." Because of [2], every boundary is a 
cycle, and we may define the quotient vector space 
(R-module), the ith-dimensional homology, 


Oi 
"Cu G^ X Cj. i? 


ker; f 
H; bou x) =F 
(Cai 8.) = 3 
(C,,0,) is “exact” if all its cycles are boundaries. 


Homology thus measures to what extent the 
sequence [1] fails to be exact. 


Simplicial Homology 


A triangulation of a surface gives rise to its 
“simplicial” chain complex: Taking coefficients in 


Z, Co, C1, Co are the free abelian groups generated 
by the set of faces, edges, and vertices, respectively; 
C; — (0) for i> 3. The map óh assigns to a triangle 
the sum of its edges; 0; maps an edge to the sum of 
its endpoints. If we are working with Z2 coeffi- 
cients, this defines for us a chain complex as [2] is 
clearly satisfied; in general, one needs to keep track 
of the orientations of the triangles and edges and 
take sums with appropriate signs (cf. [6] below). An 
easy calculation shows that for an oriented, closed 
surface M, of genus g, we have 


Ho(Msg, Z) = 

H,(M,;Z) = ^ " 
H;(Mg;Z)-—Z 
H;i(M;;Z)-—0 fori23 


Note that the Euler characteristic can be recov- 
ered as the alternating sum of the rank of the 
homology groups: 


—1)' rk H;(M; Z) [5] 


Every smooth manifold M has a triangulation, so 
that its simplicial homology can be defined just as 
above. More generally, simplicial homology can be 
defined for any simplicial space, that is, a space that 
is built up out of points, edges, triangles, tetrahedra, 
etc. Formula [5] remains valid for any compact 
manifold or simplicial space. 


Singular Homology 


Let X be any topological space, and let A” be the 
oriented z-simplex [vo,...,v,] spanned by the 
standard basis vectors v; in R"*!, The set of singular 
n-chains S,(X) is the free abelian group on the set of 
continuous maps o : A" — X. The boundary of ø is 
defined by the alternating sum of the restriction of c 
to the faces of A": 


One easily checks that the boundary of a boundary is 
zero, and hence ($,(X), ð) defines a chain complex. 
Its homology is by definition the singular homology 
H,(X;Z) of X. For any simplicial space, the inclusion 
of the simplicial chains into the singular chains 
induces an isomorphism of homology groups. In 
particular, this implies that the simplicial homology 
of a manifold, and hence its Euler characteristic do 
not depend on its triangulation. 

If in the definition of simplicial and singular 
homology we take free R-modules (where R may 


also be a field) instead of free abelian groups, we get 
the homology H,(X;R) of X with coefficients in R. 
The “universal-coefficient theorem" describes the 
homology with arbitrary coefficients in terms of the 
homology with integer coefficients. In particular, if R 
is a field of characteristic zero, 


dim H,(X; R) = rk H,(X; Z) 


Basic Properties of Singular Homology 


While simplicial homology (and the more efficient 
cellular homology which we will not discuss) is 
easier to compute and easier to understand geome- 
trically, singular homology lends itself more easily to 
theoretical treatment. 


1. Homotopy invariance. Any continuous map 
f:X—Y induces a map on homology 
fs: H,(X;R)— H,(Y; R) which only depends on 
the homotopy class of f. 


In particular, a homotopy equivalence f:X-— Y 
induces an isomorphism in homology. So, for exam- 
ple, the inclusion of the circle $! into the punctured 
plane C\{0} is a homotopy equivalence, and thus 


H;(C\{0}; R) ~ H;(S'; R) 
7 ts for ¢= 0,1 
= lo 


fo i>2 
For the one point space we have Ho(pt; R) = R. Define 


reduced homology by H,(X;R):— ker(H,(X; R) — 
H, (pt; R)). 


2. Dimension axiom. H,(pt; R) — 0 for all i. 


More generally, it follows immediately from the 
definition of simplicial homology that the homology 
of any n-dimensional manifold is zero in dimensions 
larger than z. 

We mentioned in the introduction that homology 
depends only on local data. This is made precise 


by the 


3. Mayer-Vietoris theorem. Let X=AWUB be the 
union of two open subspaces. Then the following 
sequence is exact: 


ss —H,,(A NQ B; R) ==} H,(A; R) p H,(B; R) 


—+ H,(X; R) H, (A B; R) 
— — HR) 3:0 


On the level of chains, the first map is induced by the 
diagonal inclusion, while the second map takes the 
difference between the first and second summands. 
Finally, 0 takes a cycle c=a +b in the chains of X 
that can be expressed as the sum of a chain a in A 
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and b in B to 0c:=0,a=—0O,b. For example, 
consider two cones, A and B, on a space X and 
identify them at the base X to define the suspension 
XX of X. Then X — AU B with A, B œ pt and AN 
B ~ X. The boundary map 0 is then an isomorphism: 


forallzy 20 [7] 


一 


HX; R) c Haa (XX; R) 


From this one can easily compute the homology of a 
sphere. First note that 


Ho(X;Z) = ZF! 


where k is the number of connected components in 
X. Also, S" ~ DS"-! ~ ... ~ X" S9. Thus, by [7], 


H,(S".Z)-—Z and H,(S",Z)-0 for*#n [8] 


If Y is a subspace of X, relative homology groups 
H,(X, Y; R) can be defined as the homology of the 
quotient complex $,(X)/S,(Y). When Y has a good 
neighborhood in X (ie. it is a neighborhood 
deformation retract in X), then, by the “excision 
theorem," 


H,(X, Y; R) ~ H.(X/Y;R) 


where X/Y denotes the quotient space of X with Y 
identified to a point. There is a long exact sequence 


-— HY; R)  H,(GR) — H4, Y; R) 


X, a-1i(Y; R) — +»: — Ho(X, Y; R) — 0 


This and the Mayer-Vietoris sequence give two ways of 
breaking up the problem of computing the homology of 
a space into computing the homology of related spaces. 
An iteration of this process leads to the powerful tool of 
spectral sequences (see Spectral Sequences). 


Relation to Homotopy Groups 


Let 7;(X,xo) denote the fundamental group of X 
relative to the base point xo. These are the based 
homotopy classes of based maps from a circle to X. 


If X is connected, then H4(X;Z.) is 


9 
the abelianization of mı(X,xo) P 


Indeed, every map from a (triangulated) sphere to 
X defines a cycle and hence gives rise to a homology 
class. This defines the Hurewicz map 5:7,(X; xo) 一 
H.(X; Z). In general there is no good description of 
its image. However, if X is k-connected with k > 1, 
then h induces an isomorphism in dimension k + 1 
and an epimorphism in dimension k + 2. 

Though [9] indicates that homology cannot distin- 
guish between all homotopy types, the fundamental 
group is in a sense the only obstruction to this. 
A simple form of the *Whitehead theorem" states: 
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Theorem If a map f : X —^ Y between two simpli- 
cial complexes with trivial fundamental groups 
induces an isomorphism on all homology groups, 
then it is a homotopy equivalence. 


Warning: This does not imply that two simply 
connected spaces with isomorphic homology groups 
are homotopic! The existence of the map f inducing 
this isomorphism is crucial and counterexamples can 
easily be constructed. 


Dual Chain Complexes and Cohomology 


The process of dualizing itself cannot be expected to 
yield any new information. Nevertheless, the coho- 
mology of a space, which is obtained by dualizing its 
simplicial chain complex, carries important addi- 
tional structure: it possesses a product, and more- 
over, when the coefficients are a primary field, it is 
an algebra over the rich Steenrod algebra. As with 
homology we start with the algebraic setup. 

Every chain complex (C,,0.) gives rise to a dual 
chain complex (C*,0*) where: C' — homg(C;, R) is 
the dual R-module of Ci because of [2], the 
composition of two dual boundary morphisms 
91: C — CH is trivial. Hence we may define the 
ith dimensional cohomology group as 


ker 9'*! 
im 0! 
Evaluation (c, 由 — ó(c) descends to a dual pairing 


H4(C,,0,) @r H"(C', O*) —R 


H'(C*,05) = [10] 


and when R is a field, this identifies the cohomology 
groups as the duals of the homology groups. More 
generally, the universal-coefficient theorem relates 
the two. A simple version states: let (C,,0,) be a 
chain complex of free abelian groups (such as the 
simplicial or singular chain complexes) with finitely 
generated homology groups. Then, 


HC’, 8") = H9*(C,,8,)) 6 H*5(0,0) [11] 
where H'° denotes the torsion subgroup of H, and 
H*'** denotes the quotient group H,/H'*', 

Singular Cohomology 


The dual S*(X) of the singular chain complex of a 
space X carries a natural pairing, the cup product, 
U:SP(X) & S4(X) — S?*4(X) defined by 


€ U Ult ) 


up] 92 AC lv... FI 


) 


This descends to a ss on cohomology 
groups and makes H*(X; R):— @,„>o H"(X; R) into 


an associative, graded commutative ring: u U v= 
(— 1) ser, Un. 

The *Künneth theorem" gives some geometric 
intuition for the cup product. A simple version 
states: for spaces X and Y with H*(Y;R) a finitely 
generated free R-module, the cup product defines an 
isomorphism of graded rings 


H*(X;R) Ər H*(Y;R) — H*(X x Y; R) 


For example, for a sphere, all products are trivial for 
dimension reasons. Hence, 


HS Z) = A (x) [12] 


is an exterior algebra on one generator x of degree 
n. On the other hand, the cohomology of the 
n-dimensional torus T" is an exterior algebra on 
n degree-1 generators, 


WO A sid) [13] 


The dual pairing can be generalized to the slant or 
cap product 


n: H,(X; R) Or H'(X; R) — H, (X; R) 


defined on the chain level by the formula 


(c, $) mr PO] TAE o [vise ve)” 


Steenrod Algebra 


The cup product on the chain level is homotopy 
commutative, but not commutative. Steenrod used 
this defect to define operations 
Sg! : H"(X;Z2) — H""(X; Z5) 

for all ¿> 0 which refine the cup-squaring opera- 
tion: when z—i, then Sq"(x)=xUx. These are 
natural group homomorphisms which commute 
with suspension. Furthermore, they satisfy the 
Cartan and Adem Relations 


“(el gr) = c» Sq'(x) U Sq! (y) 
i+j=n 
ee. 一 是 一 SG2H1ASG 
eo\ #—2k 
for i € 2j 


The mod-2 Steenrod algebra .A is then the free 
Zo-algebra generated by the Steenrod squares 
Sq',i > 0, subject only to the Adem relations. With 
the help of Adem's relations, Serre and Cartan found 
a Zo-basis for A: 

(Sq! i Sq". 


- Sq" |i; > 2ij1 for all j} 


The Steenrod algebra is also a Hopf algebra with 
a commutative comultiplication A:.A—.A &$.A 
induced by 


A(Sq"):— >》 Sq‘ & Sq! 


i+j=n 


The Cartan relation implies that the mod-2 
cohomology of a space is compatible with the 
comultiplication, that is, H*(X; Z2) is an algebra 
over the Hopf algebra .4. There are odd primary 
analogs of the Steenrod algebra based on the 
reduced pth power operations 


P: H'(X;Z,) — H'"b-0 (x; Z,,) 


with similar properties to .4. 

One of the most striking applications of the 
Steenrod algebra can be found in the work of 
Adams on the *vector fields on spheres problem": 
for each n, find the greatest number k, denoted K(n), 
such that there is a k-field on the (7 — 1)-sphere 5". 
Recall that a k-field is an ordered set of k pointwise 
linear independent tangent vector fields. If we write n 
in the form n — 2^**^^(2s + 1) with 0 < b < 4, Adams 
proved that K(n) — 2^ + 8a — 1. In particular, when z 
is odd, K(z) = 0. We give an outline of the proof for 
this special case in the next section. 


è The failure of associativity of the cup product at 
the chain level gives rise to secondary operations, 
the so-called “Massey products." 


Cohomology of Smooth Manifolds 


A smooth manifold M of dimension n can be 
triangulated by smooth simplices o: A" — M. If M 
is compact, oriented, without boundary, the sum of 
these simplices define a homology cycle [M], the 
fundamental class of M. The most remarkable 
property of the cohomology of manifolds is that 
they satisfy “Poincaré duality”: taking cap product 
with [M] defines an isomorphism: 


D:= [MIN: H*(M;Z)—5 H, ,(M;Z) forallk [14] 


In particular, for connected manifolds, H"(M;Z) ~ Z; 
and every map f : M' ^ M between oriented, compact 
closed manifolds of the same dimension has a degree: 
f*:H*(M;Z)— H*(M';Z) is multiplication by an 
integer deg(f), the degree of f. For smooth maps, the 
degree is the number of points in the inverse image of 
a generic point p € M counted with signs: 


deg(f) - >, sign(p’) 


p'ef ^ (p) 
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where sign(p’) is +1 or —1 depending on whether f is 
orientation preserving or reversing in a neighbor- 
hood of p'. For example, a complex polynomial of 
degree d defines a map of the two-dimensional 
sphere to itself of degree d: a generic point has n 
points in its inverse image and the map is locally 
orientation preserving. On the other hand, a map of 
$"-! induced by a reflection of R" reverses orienta- 
tion and has degree —1. Thus, as degrees multiply on 
composing maps, the antipodal map x= —x has 
degree (— 1)". As an application we prove: 


Every tangent vector field on an even-dimensional 
sphere S"-! has a zero. 


Proof Assume v(x) is a vector field which is nonzero 
for all x € S"^!. Then x is perpendicular to v(x), and 
after rescaling, we may assume that v(x) has length 1. 
The function F(x, t) = cos (£)x + sin (t) v(x) is a well- 
defined homotopy from the identity map (t — 0) to 
the antipodal map (t — 7). But this is impossible as 
homotopic maps induce the same map in (co)homo- 
logy and we have already seen that the degree of the 
identity map is 1 while the degree of the antipodal 
map is (— 1)" — —1 when n is odd. 


è [t is well known that two self-maps of a sphere of 
any dimension are homotopic if and only if they 
have the same degree, that is, 7,(S") > Z for n > 1. 

e When M is not orientable, [M] still defines a cycle 
in homology with Zņ2-coefficients, and [MIN 
defines an isomorphism between the cohomology 
and homology with Z coefficients. 

e As [M] represents a homology class, so does every 
other closed (orientable) submanifold of M. It is 
however not the case that every homology class 
can be represented by a submanifold or linear 
combinations of such. 


Cohomology is a contravariant functor. Poincaré 
duality however allows us to define, for any f : M' 一 M 
between oriented, compact, closed manifolds of arbi- 
trary dimensions, a “transfer” or “Umkehr map,” 


f :- D''f.D': H'(M5Z)  H**(M;Z) 


which lowers the degree by c= dim M' — dim M. It 
satisfies the formula 


ff) uy) - xuf'G) 


for all x € H*(M; Z) and y € H*(M'; Z). When f is a 
covering map then f' can be defined on the chain 


level by 
f (x)(e) :— *( Fà / 


f(0)=0 
where x € C*(M’) and o € C,(M). 
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de Rham Cohomology 


If x1,...,X, are the local coordinates of R”, define an 
algebra 9* to be the algebra generated by symbols 
dx,,...,dx, subject to the relations dx;dx; = —dx;dx; 
for all i,j. We say dx;,---dx;, has degree q. The 
differential forms on R" are the algebra 


Q"(R") := {C™functions on R") &g Q* 


The algebra Q*(R")— QD; - o QIR”) is naturally 
graded by degree. There is a differential operator 
d :Q4(R") 5 Q4*! (R") defined by 


1. if f € Q?(R"), then df = Y (Of /Ax;)dx; 
2. if w= 》 fidxı, then dw = > dfidx; 


I stands here for a multi-index. For example, in R? 
the differential assigns to 0-forms (= functions) the 
gradient, to 1-forms the curl, and to 2-forms the 
divergence. An easy exercise shows that d? — 0 and 
the gth de Rham cohomology of IR” is the vector space 


ker d : Q4 (R^) ^ QI (R") 
q B. =. 
Hie g(R.) = md. Q4 (RÀ) AR” 


More generally, the de Rham complex O*(M) and 
its cohomology Hj. p(M) can be defined for any 
smooth manifold M. 

Let o be a smooth, singular, real (q + 1)-chain on 
M, and let w € 24(M). Stokes theorem then says 


f o= fw 
ðo a 


and therefore integration defines a pairing between 
the gth singular homology and the gth de Rahm 
cohomology of M. This pairing is exact and thus de 
Rahm cohomology is isomorphic to singular coho- 
mology with real coefficients: 


H3, &(M) ~ (H.(M;R))* ~ H'(M; R) 


Let 2*(M) denote the subcomplex of compactly 
supported forms and H*(M) its cohomology. Integra- 
tion with respect to the first i coordinates defines a map 


VR) — OF (R"7) 


which induces an isomorphism in cohomology; note in 
particular H"(R") =R. More generally, when E —^ M 
is an i-dimensional orientable, real vector bundle over 
a compact, orientable manifold M, integration over 
the fiber gives the *Thom isomorphism": 


H;(E) ~ H;(M) ~ Hi, i (M) 


For orientable fiber bundles F 一 M AM with 
compact, orientable fiber F, integration over the 
fiber provides another definition of the transfer map 


f : Ha &(M') > Hicr(M) 


Hodge Decomposition 


Let M be a compact oriented Riemannian manifold of 
dimension z. The Hodge star operator, *, associates to 
every q-form an (m—q)-form. For R” and any 
orthonormal basis (ei, . ..,e,), it is defined by setting 


*(e1 A++ ^ eg) :一 cepi A+++ 和 人 en 


where one takes 十 if the orientation defined by 
(e1,...,e,] is the same as the given one, and — 
otherwise. Using local coordinate charts this defini- 
tion can be extended to M. Clearly, * depends on the 
chosen metric and orientation of M. If M is 
compact, we may define an inner product on the 
q-forms by 


(uw, w ) =| w A ku 
M 


With respect to this inner product * is an isometry. 
Define the codifferential via 


6 := (—1)?***! x dx : 09(M) — 27-1 (M) 
and the Laplace-Beltrami operator via 
A := ôd + dô 


The codifferential satisfies 6? — 0 and is the adjoint 
of the differential. Indeed, for g-forms w and (q + 1)- 
forms w: 


(dw, w') = (w, du’) [15] 

It follows easily that A is self-adjoint, and 
furthermore, 

Aw=0 ifand only if do — 0 and ó» — 0 [16] 


A form w satisfying Au — 0 is called “harmonic.” Let 
4 denote the subspace of all harmonic q-forms. It is 
not hard to prove the *Hodge decomposition theorem": 


Q? = HI p imd $ imó 
Furthermore, by adjointness [15], a form w is closed 
only if it is orthogonal to im 6. On calculating the 


de Rham cohomology we can also ignore the 
summand im d and find that: 


Each de Rham cohomology class on a compact 

oriented Riemannian manifold M contains a unique 

harmonic representative, that is, H1. R(AM = H3. 
Warning: This is an isomorphism of vector spaces 
and in general does not extend to an isomorphism of 
algebras. 


Examples 


We list 
examples. 


the cohomology of some important 


Projective Spaces 


Let RP” be real projective space of dimension n. Then, 
H*(RP": Z5) = Zo |x]/ (x"*!) 


is a stunted polynomial ring on a generator x of 
degree 1. 

Similarly, let CP" and HP" denote complex and 
quaternionic projective space of real dimensions 27 
and 4n, respectively. Then, 


H*(CP"; Z) = Z[y]/(y"*") 
H* (HP"; Z) = Z[z]/(z"*") 


are stunted polynomial rings with deg(y)—2 and 
deg(z) — 4. 


Lie Groups 


Let G be a compact, connected Lie group of rank 1, 
that is, the dimension of the maximal torus of G is l. 
Then, 


H'(G,Q) 


* 
i (224, 15 424,1: d ; 434,1] 
Q 


where |a; —; and di,...,d; are the fundamental 
degrees of G which are known for all G. Often this 
structure lifts to the integral cohomology. In 
particular we have: 


Hz; (SO(2k + 1); Z)) 
^s N las;a7,... ,a 1] 
Z 
Hg (SO(2k); Z)) 


* 
~ 人 41,47, . - - , 4k 5. 241] 
Z 


H*(U(k); Z) ~ 人 [ai.23;:-- 2211] 
Z 


Classifying Spaces 


For any group G there exists a classifying space BG, 
well defined up to homotopy. Classifying spaces 
are of central interest to geometers and topologists 
for the set of isomorphism classes of principal 
G-bundles over a space X is in one-to-one corre- 
spondence with the set of homotopy classes of maps 
from X to BG. In particular, every cohomology class 
c € H*(BG;R) defines a characteristic class of 
principle G-bundles E over X: if E corresponds to 
the map fr : X — BG, then c(E) :=ff(c). 
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BG can be constructed as the space of G-orbits of 
a contractible space EG on which G acts freely. 
Thus, for example, 


BZ — R/Z c S! 
BZ2 = (lim S”)/Z2 co RP” 


BS’ = (lim 5957/8! e OP 


and more generally, infinite Grassmannian mani- 
folds are classifying spaces for linear groups. When 
G is a compact connected Lie group, 


H* (BG; Q) ~ Q[xa,, ,. .. X24] 
with d; as above and |x;| — i. In particular, 
H*(BSO(2k + 1); Z[1/2]) 


= Z|1/2]lp1. p». XE. 173 
H* (BSO(2k); Z|1/2]) 


œ Z[/2]lpi. P2. .-.. Pai. ex] 
H*(BU(k); Z) = Zl[e,c2,...,c&] 


where the Pontryagin, Euler, and Chern classes have 
degree |p;| — 41, le;| — 2k, and |c;| = 2i, respectively. 


Moduli Spaces 


Let 人 Me be the space of Riemann surfaces of genus g 
with n ordered, marked points. There are naturally 
defined classes x; and e1,...,e, of degree 2; and 2, 
respectively. By Harer-Ivanov stability and the 
recent proof of the Mumford conjecture (Madsen- 
Weiss, preprint 2004), there is an isomorphism up to 
degree * < 3g/2 of the rational cohomology of Mg 
with 


Q[K1, &2,...] & Olei, ..., e] 


The rational cohomology vanishes in degrees * > 
4g — 5 i£ n —0, and * > 4g — 4 +n if n > 0. Though 
the stable part of the cohomology is now well under- 
stood, the structure of the unstable part, as proposed by 
Faber (Viehweg 1999), remains conjectural. 


Generalized Cohomology Theories 


The three basic properties of singular homology 
appropriately dualized, hold of course also for 
cohomology. Furthermore, they (essentially) deter- 
mine (co)homology uniquely as a functor from the 
category of simplicial spaces and continuous func- 
tions to the category of abelian groups. If we drop 
the dimension axiom (2), we are left with homotopy 
invariance (1), and the Mayer—Vietoris sequence (3). 
Abelian group valued functors satisfying (1) and (3) 
are so called “generalized (co)homology theories." 
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K-theory and cobordism theory are two well-known 
examples but there are many more. 


K-Theory 


The geometric objects representing elements in com- 
plex K-theory K?(X) are isomorphism classes of finite 
dimensional complex vector bundles E over X. Vector 
bundles E, E' can be added to form a new bundle 
E & E' over X, and K?(X) is just the group completion 
of the arising monoid. Thus, for example, for the point 
space we have KÜ(pt) — Z. Tensor product of vector 
bundles E & E' induces a multiplication on K-theory 
making K*(X) into a graded commutative ring. 

In many ways K-theory is easier than cohomol- 
ogy. In particular, the groups are 2-periodic: all even 
degree groups are isomorphic to the reduced 
K-theory group K?(X) := coker(KP (pt) = Z — K°(X)), 
and all odd degree groups are isomorphic to 
KHR E= KU), 

The theory of characteristic classes gives a close 
relation between the two cohomology theories. The 
Chern character map, a rational polynomial in the 
Chern classes, defines 


ch : K°(X) &z Q— H(X; Q) 
:= © H*(X;Q) 
k>0 


an isomorphism of rings. Thus, the K-theory and 
cohomology of a space carry the same rational 
information. But they may have different torsion 
parts. This became an issue in string theory when 
D-brane charges which had formerly been thought 
of as differential forms (and hence cohomology 
classes) were later reinterpreted more naturally as 
K-theory classes by Witten 1998) 


e There are real and quaternionic K-theory groups 
which are 8-periodic. 


Cobordism Theory 


The geometric objects representing an element in the 
oriented cobordism group 2%,(X) are pairs (M,f) 
where M is a smooth, orientable n-dimensional 
manifold and f: M — X is a continuous map. Two 
pairs (M, f) and (M', f^) represent the same cobord- 
ism class if there exists a pair (W, F) where W is an 
(n + 1)-dimensional, smooth, oriented manifold 
with boundary OW = MU —M' such that F: W —^ X 
restricts to f and f’ on the boundary OW. Disjoint 
union and Cartesian product of manifolds define an 
addition and multiplication so that O;5(X) is a 
graded, commutative ring. 


e Similarly, unoriented, complex, or spin cobordism 
groups can be defined. 


Elliptic Cohomology 


Quillen proved that complex cobordism theory is 
universal for all complex oriented cohomology 
theories, that is, those cohomology theories that 
allow a theory of Chern classes. In a complex 
oriented theory, the first Chern class of the tensor 
product of two line bundles can be expressed in 
terms of the first Chern class of each of them via a 
two-variable power series:  c4(E & E') ^ F(ci(E), 
c1(E')). F defines a formal group law and Quillen's 
theorem asserts that the one arising from complex 
cobordism theory is the universal one. 

Vice versa, given a formal group law, one may try to 
construct a complex oriented cohomology theory from 
it. In particular, an elliptic curve gives rise to a formal 
group law and an elliptic cohomology theory. Hopkins 
et al. have described and studied an inverse limit of 
these elliptic theories, which they call the theory of 
topological modular forms, tmf, as the theory is closely 
related to modular forms. In particular, there is a 
natural map from the groups tmf», (pt) to the group of 
modular forms of weight n over Z. After inverting a 
certain element (related to the discriminant), the 
theory becomes periodic with period 24? — 576. 

Witten (1998) showed that the purely theoreti- 
cally constructed elliptic cohomology theories 
should play an important role in string theory: the 
index of the Dirac operator on the free loop space of 
certain manifolds should be interpreted as an 
element of it. But unlike for ordinary cohomology, 
K-theory, and cobordism theory we do not (yet) 
know a good geometric object representing elements 
in this theory without which its use for geometry 
and analysis remains limited. Segal speculated some 
20 years ago that conformal field theories should 
define such geometric objects. Though progress has 
been made, the search for a good geometric 
interpretation of elliptic cohomology (and tmf) 
remains an active and important research area. 


Infinite Loop Spaces 


Brown’s representability theorem implies that for 
each (reduced) generalized cohomology theory b* we 
can find a sequence of spaces E, such that P"(X) is 
the set of homotopy classes [X, E,,] from the space X 
to E, for all n. Recall that the Mayer-Vietoris 
sequence implies that h”(X) ~ 5"*!(X X). The sus- 
pension functor X is adjoint to the based loop space 
functor which takes a space X to the space of 
based maps from the circle to X. Hence, 


b"(X) = [X, En] = [EX, Ena] 
= [X, QE 


and it follows that every generalized cohomology 
theory is represented by an infinite loop space 


Eo S QE, = --- e Q" E, e +>: 


Vice versa, any such infinite loop space gives rise to 
a generalized cohomology theory. 

One may think of infinite loop spaces as the 
abelian groups up to homotopy in the strongest 
sense. Indeed, ordinary cohomology with integer 
coefficients is represented by 


Z~ OS! e qQ^cP» e... c O"K(n, Z) e --- 


where by definition the Eilenberg-MacLane space 
K(n, Z) has trivial homotopy groups for all dimen- 
sions not equal to n and z,K(n, Z) — Z. Complex 
K-theory is represented by 


Z x BU ~ Q(U) = Q*(BU) ~ 03(U) ^ --- 


This is Bott's celebrated “periodicity theorem." 
Finally, oriented cobordism theory is represented by 


Q?* MSO := lim Q"Th(^,) 


where 4,— BSO, is the universal n-dimensional 
vector space over the Grassmannian manifold of 
oriented z-planes in R^, and Th(^,) denotes its 
Thom space. 

A good source of infinite loop spaces are 
symmetric monoidal categories. Indeed every infinite 
loop space can be constructed from such a category: 
the symmetric monoidal structure gives the corre- 
sponding homotopy abelian group structure. For 
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Introduction 


Combinatorics is a vast field which enters particularly 
in a crucial way in statistical physics. There, it is 
particularly the enumerative problems that are of 
importance. Therefore, in this article, we shall mainly 
concentrate on the enumerative aspects of combina- 
torics. We first recall the basic terminology, in 
particular the basic combinatorial objects and num- 
bers, together with the simplest facts about them. We 
then provide introductions into the most important 
techniques of enumeration: the generating function 
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example, the category of  finite-dimensional, 
complex vector spaces and their isomorphisms 
gives rise to Z x BU. To give another example, in 
quantum field theory, one considers the (d+ 1)- 
dimensional cobordism category with objects the 
compact, oriented d-dimensional manifolds, and 
their (d + 1)-dimensional cobordisms as morphisms. 
Disjoint union of manifolds makes this category 
into a symmetric monoidal category. The associated 
infinite loop space and hence generalized cohomol- 
ogy theory has recently been identified as a (d + 1)- 
dimensional slice of oriented cobordism theory 
(Galatius et al. preprint 2005). 


See also: Characteristic Classes; Equivariant 
Cohomology and the Cartan Model; Functional Equations 
and Integrable Systems; Index Theorems; Intersection 
Theory; K-Theory; Moduli Spaces: An Introduction; 
Riemann Surfaces; Spectral Sequences. 
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technique, Redfield-Pólya theory, methods of solving 
functional equations of combinatorial origin, meth- 
ods of asymptotic enumeration, the theory of heaps, 
and the transfer matrix method. The subsequent 
sections then discuss specific problem circles with 
relation to statistical physics more closely. We discuss 
lattice path problems, explain Kasteleyn's method of 
enumerating perfect matchings and tilings, present 
the fundamental theorems on nonintersecting paths, 
and provide an introduction into the research field 
involving vicious walkers, plane partitions, rhombus 
tilings, alternating sign matrices, six-vertex config- 
urations, and fully packed loop configurations. 
Finally, we explain how one should treat binomial 
and hypergeometric series, which frequently arise in 
enumeration problems. 
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Basic Combinatorial Terminology 


In this section we review basic combinatorial 
notions and facts. The reader can find a more 
detailed treatment and further results, for example, 
in chapter 1 of Stanley (1986). 

The basic combinatorial choice problems and 
their solutions are: there are 2” subsets of an 
n-element set. There are (7)k-element subsets of an 
n-element set. Given an alphabet A = (21,22,...], a 
word is a (finite or infinite) sequence of elements of 
A. Usually, a finite word is written in the form 
11105 ...t, (with 1; eA). Out of the letters 
(1,2, ..., kJ, one can build k” words of length n. 
Out of the letters {1,2,...,}, one can build ("**-! ) 
increasing sequences of length n. The number of 
permutations of an z-element set is m!. The set of 
permutations of (1,2,...,7] is denoted by G,. The 
number of permutations of an n-element set with 
exactly k cycles is the Stirling number of the first 
kind, s(z,k). These numbers are given as the 
expansion coefficients of falling factorials, 


x(x — 1)---(x— n+ 1) = Y (C1) s(n, k)xt 


k=0 


or in form of the double (formal) power series 


Y^ s(n, k)xtX = (1 +y)" 


| 
n,k>0 


A partition of a set is a collection of pairwise 
disjoint subsets the union of which is the complete 
set. The subsets in the collection are called the 
blocks of the partition. The total number of 
partitions of an n-element set is the Bell number 
B,. These numbers are given by 


The number of partitions of an n-element set into 
exactly k blocks is the Stirling number of the second 
kind, S(z, k). These numbers are given by 


>. S(n, k)x* > = ex(e 一 1) 
n,k>0 mS 


or, explicitly, by 


k å 
sink) = Fa) 


A composition of a positive integer is a represen- 
tation of n as a sum m=s, +5s2+-:--+s, of other 
positive integers s;, where the order of the sum- 
mands matters. The total number of compositions of 


n is 2"-!. The number of compositions of n with 
exactly k summands is (#1) A partition of a 
positive integer n is a representation of n as a sum 
n=, +A2+---+A, of other positive integers À;, 
where the order of the summands does matter. Thus, 
we may assume that the summands are ordered, 
à > à2 >- >Ap >0. This is the motivation 
to write partitions most often in the form of 
tuples (41,A42,...,AÀ,) the entries of which are 
weakly decreasing. The summands of a partition 
are called the parts of the partition. Let p(n) denote 
the number of partitions of n. These numbers are 
given by 


OQ E 1 
DPM” = ne 


If p(n, k) denotes the number of partitions of n into 
at most k parts, then we have 
1 


» Pl kja” = Tax) oxi 


Finally, if p(m,k,m) denotes the number of parti- 
tions of n into at most k parts, all of which are at 
most 77, then 


X pln, kb, m)x" 
n20 
E (1 E xem - gt mn-l) m. (1 Z gmt) 
(1 —xk)\(1 — xk-1)... (1 — x) 


The expression on the right-hand side is called 
q-binomial coefficient, and is denoted by [ Tm lo 

Partitions are frequently encoded in terms of their 
Ferrers diagrams. The Ferrers diagram of a partition 
A= (&1,À2,..., Àj) is an array of cells with £ left- 
justified rows and A; cells in row i. For example, the 
diagram in Figure 1 is the Ferrers diagram of the 
partition (3, 3, 2). 

A lattice path P in Z^ (where Z denotes the set of 
integers) is a path in the d-dimensional integer 
lattice Z^ which uses only points of the lattice, that 
is, it is a sequence (Po, P1,..., Pj), where P; € Z? for 
all i. The vectors PoP1,P; acera Pra Py are called 
the steps of P. The number of steps, /, is called the 
length of P. Figure 2 shows a lattice path in Z? of 
length 11. 


~i 


Figure 1 A Ferrers diagram. 


Figure 2 A Motzkin path. 


A Dyck path is a lattice path in the integer 
plane Z? consisting of up-steps (1, 1) and down-steps 
(1, — 1), which starts at the origin, never passes below 
the x-axis, and ends on the x-axis. See Figure 3 for an 
example. 

The number of Dyck paths of length 2» is the 


Catalan number 
jo 1 (z 
n+1\n 


The generating function (see the next section for an 
introduction to the theory of generating functions) 
for these numbers is 


y Cx" = 1-v1i-4x : — 4x [1] 


The reader is referred to exercise 6.19 in Stanley 
(1999) for countless occurrences of the Catalan 
numbers. 

A Motzkin path is a lattice path in the integer 
plane Z^ consisting of up-steps (1,1), level steps 
(1,0), and down-steps (1,—1), which starts at the 
origin, never passes below the x-axis, and ends on 
the x-axis. The path in Figure 2 is in fact a Motzkin 
path. The number of Motzkin paths of length 7 is 
the Motzkin number 


1 /2k\ pn 
RP pL 
The generating function for these numbers is 


E „ 1-x—v1-2x-—3x? 
2 Met = = [2] 
The reader is referred to exercise 6.38 in Stanley (1999) 
for numerous occurrences of the Motzkin numbers. 

A Schróder path is a lattice path in the integer 
plane Z? consisting of horizontal steps (1,0) and 


Figure 3 A Dyck path. 
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Figure 4 A Schroder path. 


vertical steps (0, 1), which starts at the origin, never 
passes below the diagonal x — y, and ends on the 
diagonal x — y. See Figure 4 for an example. 

The number of Schröder paths of length z is the 
(large) Schröder number 


m d Pul. a ey 


k>0 


The generating function for these numbers is 


~ p l1-x-v1-6x-4x* 
> $x" = 一 [3] 
= 2x 


The reader is referred to exercise 6.39 in Stanley 
(1999) for numerous occurrences of the Schróder 
numbers. 

There is another famous sequence of numbers 
which we did not touch yet, the Fibonacci numbers 
F,. They are given by 


n+1 
n= (54) 


AV 


with generating function 


oe [4] 


， i 


They also occur in numerous places. For example, 
the number F, counts all paths on the integers Z 
from 0 to n with steps (1,0) and (2,0). 

An undirected graph G consists of vertices and 
edges. An edge is a two-element subset of the 
vertices, which, however, is thought of as a line or 
curve connecting the two vertices. See Figure 5a 
for an example. The usual notation for a graph G 
is G—(V, E), where V is the set of vertices and E 
is the set of edges of G. A graph is planar if it is 
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(a) (b) 
Figure 5 (a) An undirected graph. (b) A directed graph. 


embedded in the plane (sphere) in such a way that 
the curves which mark the edges do not intersect 
in their interiors. There can be several different 
ways to embed the same graph in the plane (or in 
another surface). When we speak of a planar 
graph then we assume the graph already to be 
embedded in a given way. For example, the graph 
in Figure 5 is not a planar graph, by its drawing. 
However, there is a different embedding which is 
planar (namely, all embeddings which put the 
vertex v3 above the vertex vs and leave the other 
vertices as they are). A tree is a graph without any 
cycles. 

A directed graph (or digraph) G consists of 
vertices and arcs (which are sometimes also called 
directed edges). An arc is a pair of vertices, which, 
however, is thought of an arrow pointing from the 
first vertex of the pair to the second. See Figure 5b 
for an example. The usual notation for a directed 
graph G is again G=(V,E), where V is the set of 
vertices and E is the set of arcs of G. All other 
notions explained for undirected graphs have analo- 
gous meanings for directed graphs. 

Graphs can be labeled, in which case each vertex 
is assigned a label, or unlabeled. The (undirected) 
graph in Figure 5a is labeled, whereas the (directed) 
graph in Figure 5b is unlabeled. 


Generating Functions 


Generating functions are the very basic tools of 
enumeration. For introductions to this technique, 
from different points of view, the reader is referred 
to Bergeron et al. (1998), Flajolet and Sedgewick 
(chapter 1 in the reference listed in “Further read- 
ing” section), and Stanley (1998, chapter 1; 1999, 
chapter 4). 

Let A be a set of (unlabeled) objects. Each object 
ain A has a certain size, |a|, which is a non-negative 
integer. Let us also assume that there is only a finite 
number of objects from A of a given size. Let a, be 
the number of objects from A of size n. The 


(ordinary) generating function for A is the formal 
power series 


Fa(x) = 3 a^ = Say” 
=0 


acA n 


(“formal” means that x is just an indeterminate, not 
a real or complex number. One can compute with 
formal power series in the same way as with analytic 
series, only that convergence issues do not arise, 
respectively that "convergence" has a different 
meaning; cf. Stanley (1998, section 1.1)) Typical 
examples are Sets (the collection containing all 
“unlabeled sets," that is, all objects of the form 
(e *,..., *], including the empty set, where the size 
of [e,*,...,*] is the number of e’s), Sequences 
(the collection containing all *unlabeled sequences," 
that is, all objects of the form (e, e, ..., €), including 
the empty sequence), Cycles (unlabeled cycles), 
with respective generating function 


Fsets(X) = P sequences (x) 一 
- e 
a 


or Trees (unlabeled trees). 

If A and B are two sets of objects, one can define 
several other sets of objects using them. The union 
of A and B, written AUB, has as a groundset the 
disjoint union of A and B, and the size of an element 
from A is its size in A, while the size of an element 
from B is its size in B. We have 


FAup(x) = Fa(x) + Fe(x) [6] 


The product of A and B, written A x B, has as a 
groundset the set of pairs A x B, and the size of an 
element (a,b) from A x B is the sum of the sizes of a 
(in A) and of b (in B). We have 


FAxp(x) = Fa(x) - Fs(x) [7] 


The substitution of two sets A and B of objects 
can only be defined in certain circumstances, and 
only in certain more restrictive circumstances the 
generating function for the substitution can be 
computed by substituting the generating functions 
for .A and B. Let us assume that any object a from 
A of size n, by its structure, has n atoms (nodes). For 
example, if A is a certain set of trees, where the size 
of a tree is the number of leaves in the tree, then we 
may take, as the atoms, the leaves of the tree. In this 
situation, the substitution of B in A, denoted by 
A(B), is the set of objects which arises by replacing 
the atoms of objects from .A by objects from B in all 
possible ways. The size of an object from A(B) is the 
sum of the sizes of the objects from B that it 


Fovcias (x) 


contains. In order that A(B) contains only a finite 
number of objects of a given size, we must assume 
that B contains no elements of size 0. If, in addition, 
the atoms of any element a from .4 inherit an order 
(e.g., if .A is a set of binary trees, then the leaves of a 
binary tree are ordered in a natural way from “left” 
to “right”), then we have 


Faw (x) = Fa(Fp(x)) [8] 


However, this equation is not true in general. The 
general formula comes out of Redfield-Pólya theory 
(see [21] and [24]) and requires the notion of cycle 
index series. For example, if B is the set of connected 
(unlabeled) graphs, A is Sets, so that A(B) is the 
set of all (connected and disconnected) graphs, then 
[8] is not true, but what is true is 


Fsets(B) = exp (Fa(x) +4F (x7) -- 1 Fg(x?) +---) [9] 


This holds, in fact, for any set B of unlabeled objects. 
(This is seen by combining [24], [17], and [21].) 

Next we deal with the enumeration of labeled 
objects. Let .4 be a set of labeled objects, again, each 
object a with a certain size |a| which is a non- 
negative integer. “Labeled” means that each object 
of size n, by its structure, comes with n atoms 
(nodes) which are labeled 1,2,...,7. For example, 
A may be the set of all labeled graphs, where the 
size of a graph is the number of its vertices, and 
where the vertices are labeled 1,2,...,7. Again, we 
assume that there is only a finite number of objects 
from A of a given size. Let a, be the number of 
objects from A of size n. The exponential generating 
function for .A is the formal power series 


n 


la| ca 
Eas) = = m 


acA n=O 


Typical examples are Sets (the collection containing 
all “labeled sets,” that is all objects of the form 
(1,2,...,72], including the empty set), Permuta- 
tions, Cycles (labeled cycles), with respective 
generating functions 


Esets(x) = exp(x) [10] 


1 
1 一 X 


[11] 


FE oormütations (x) m 


Ecvc1esX) = log [12] 


1—x 
or Trees (labeled trees). The explicit form of the 
generating function for Trees is discussed in the 
section “Solving equations for generating functions: 
the Lagrange inversion formula and the kernel 
method." 
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If .A and B are two sets of objects, one defines 
again several other sets of objects using them. The 
union of A and B, written A U B, has as a groundset 
the disjoint union of A and B, and the size of an 
element from A is its size in A, while the size of an 
element from B is its size in B. We have 


E Aup(x) = EA(x) + En(x) [13] 


To define the product of A and B, written A x B, 
we cannot simply take .A x B as a groundset, we 
must also say something about the labeling of the 
objects. So, as a groundset we take all pairs (a,b) 
with a € .A and b € B, but labeled in all possible 
ways by 1,2,...,|a| -- |b| such that the order of 
labels assigned to a respects the original order of 
labels of a, and the same for b. The size of such an 
element (a,b) is again the sum of the sizes of a (in A) 
and of b (in B). We have 


E Axp(x) = EA(x) - Eg(x) [14] 


Since, in the labeled world, objects come automati- 
cally with atoms, the substitution of two sets .4 and 
B of objects can now always be defined. The 
substitution of B in A, denoted by A(B), is the set 
of objects which arises by replacing the atoms of 
objects from A by objects from B in all possible 
ways, and labeling the substituted objects in all 
possible ways by 1,2,..., 5 ,|6| (the sum being 
over the objects from B which were put in the places 
of the atoms) that are consistent with the original 
labelings of the objects from B. The size of an object 
from .A(B) is the sum of the sizes of the objects from 
B that it contains. In order that A(B) contains only a 
finite number of objects of a given size, we must 
assume that B contains no elements of size 0. Then 
we have 


E ap) (x) = EA(En(x)) [15] 
An example of a composition is 
Permutations = Sets(Cycles) 


Thus, from [15] we have 


Epermitations (x) = Esets (Eeyeies (x)) 


corresponding to the identity 


—— = exp(log1/(1 — x)) 


Another manifestation of the composition rule is, for 
example, the fact (which is sometimes called the 
“exponential principle”) that, if one takes the log of 
the partition function for some maps, the result is 
the partition function for the connected maps among 
them. 
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All of the above can be generalized to a weighted 
setting. Namely, if .A is a set of objects (labeled or 
unlabeled), and if w: A—R is a weight function 
from A into some ring R, then all of the above 
remains true, if we replace the definitions of F4(x) 
and E4(x) above by the weighted sums 


and 


respectively, if in the definition of the union of A 
and B we define the weight of an object to be its 
weight in A, respectively B, if in the definition of the 
product of A and B we define the weight of an 
object (a, b) to be the product of the weights of a 
and b, and if in the definition of the substitution we 
define the weight of an object in .A(B) as the product 
of the weights of the objects from B that were put in 
place of the atoms. 


Redfield-Pólya Theory of Colored 
Enumeration 


The natural and uniform environment for the 
separate treatment of generating functions for 
unlabeled and labeled objects in the last section is 
the theory for counting colored objects founded by 
Redfield and Pólya, in the modern treatment 
through cycle index series due to Joyal. We refer 
the reader to Bergeron et al. (1998, appendix 1), 
de Bruijn (1981), and Stanley (1999, chapter 7) for 
further reading. 

Let .A be a set of labeled objects with the 
constraint that there is only a finite number of 
objects of a given size. The cycle index series for .4 is 
the formal multivariable series 


25 AUN, Was +) 


E 1 . cila) cla) cife 16 
= 2.) >. fix, (A)x ee) . (16) 
n=0 


GES» 


where fix,(.A) is the number of objects a from A that 
remain invariant when the labels are permuted 
according to the permutation ø (in particular, if o € 
S,,, the size of a must be n in order that o can be 
applied to the labels), and where ci(o) denotes the 
number of cycles of length i of c. 

In most cases, it is difficult to obtain compact 
expressions for the cycle index series. However, for 


our familiar families of objects, compact expressions 
are available: 


X x 
Zgete(X1;X2,+..) = exp (x1 fe + + D [17] 


L 3 
oo 1 
Zoaxmutations (TL: WY uL 3 = I] 1 [18] 
El* 
o. (i 1 
Zcycites(*1;%X2; -| = y og- er [19] 


i-1 ! 


where (7) is the Euler totient function (the number 
of positive integers f < i relatively prime to i). 
What makes the cycle index series so fundamental 
is the fact that the generating functions from the last 
section are  specializations of it. Namely, the 
exponential generating function for A is equal to 


E A(x) 一 LAK, 0, 0, ...) [20] 


If, given the set of labeled objects A, we produce a 
set of unlabeled objects .4 by taking all the objects 
from .A but forgetting the labels, then the ordinary 
generating function for .A is another specialization 
of the cycle index series, 


F j(x) = PAE a TN [21] 


The cycle index series satisfies the following 
properties with respect to union, product and 
composition of sets of objects: 


Z AuB(X1, X2; aia .) = ZA(x1. x2, 9 ti 
+ Zp(x1, X2, ...) [22] 
Z AxB(X1, X2, T " e aXX. T. .) 
X p(x1,xX2,. --) [23] 
Z Amy X1, Xh) = AEB (VL NL Lari) 


Zpg(X2, X4, Xe ...). 
Zp(X3, X6, X9, biu Jd es a [24] 


Similar to the theory of generating functions 
surveyed in the last section, one can also develop a 
weighted version of the cycle index series. Given a set 
of labeled objects A, where each object a is assigned a 
weight w(a), one changes the definition [16] insofar as 
fix,(A) gets replaced by the weighted sum 
?ola a Wla), where o(a) means the object arising 
from a by permuting the labels according to o. Then all 
the above formulas remain true in this weighted setting. 

Cycle index series are instrumental in the enu- 
meration of colored objects. The basic situation is 
that we have given a set .A of unlabeled objects so 
that every object of size n comes with n atoms 
(nodes). For example, we may think of .A as the set 


of cycles. We are now going to color each atom by a 
color from the set of colors C. The question that we 
pose is: how many different colored objects of a 
given size are there? In our example, if C consists of 
the two colors “black” and “white,” then we are 
asking the question of how many necklaces one can 
make out of z pearls that can be black or white. In 
terms of generating functions, we want to compute 


T (x) = » x^ 


where the sum is over all colored objects c that one 
can obtain by coloring the objects from .A. 

The central result of Redfield-Pólya theory is that, 
if .A is the set of labeled objects that one obtains 
from .A by labeling the objects of .A in all possible 
ways, then 


rx) = Zl Che [Glx^, [Gk .....) 


There is again a weighted version. One allows the 
objects a from A to have weight :w(a) € R. More- 
over, one assumes a weight function f: C— R on 
the colors with values in the ring R. One defines the 
weight of a colored object obtained by coloring 
the atoms of a to be w(a) multiplied by the product 
of all f(y), where y ranges over all the colors of the 
atoms (including repetitions of colors). Let T ;(w, f) 
denote the sum of all the weights of all colored 
objects obtained from .A. Then 


l'i(w,f) = Z4 (> Ftd: > fey > fer. 


ceC ceC ceC 


We remark that these results cover also the case of 
enumeration of objects under a group action. This 
includes the enumeration of objects on which we 
impose certain symmetries. See Bergeron et al. 
(1998, appendix 1), de Bruijn (1981), and Stanley 
(1999, chapter 7) for more details. The enumeration 
of asymmetric objects is the subject of an ongoing 
research program (cf. Labelle and Lamathe (2004)). 


Solving Equations for Generating 
Functions: The Lagrange Inversion 
Formula and the Kernel Method 


In this section, we describe two methods to solve 
functional equations for generating functions. The 
Lagrange inversion makes it possible (in some situa- 
tions) to find explicit expressions for the coefficients of 
an implicitly given series. The kernel method (and its 
extensions), on the other hand, is a powerful method 
to obtain an explicit expression for an implicitly given 
function. We refer the reader to Flajolet and 
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Sedgewick, (section VII.5 of the reference in “Further 
reading" section) for further reading. 

In many situations it will happen that, when we 
apply the methods from the last section, we end up 
with a functional equation for the generating function 
f(x) - M57 ofunx" that we wanted to compute. For 
example, if £, denotes the number of labeled rooted 
trees with m nodes, and if we write T(x)— 
5O 1 tix" /nl, then, by applying a straightforward 
decomposition of a tree into its root and its set of 
subtrees attached to the root, we obtain the equation 


T(z) = zexp(T(z)) [25] 


How does one solve such an equation? As a matter 
of fact, for T(z), there is no expression in terms of 
known functions. However, the Lagrange inversion 
formula enables one to find the coefficients t, /r! of 
T(z) explicitly. The theorem reads as follows. 


Theorem Let g(x) be a formal Laurent series 
containing only a finite number of negative powers 
of x, and let f(x) be a formal power series witbout 
constant term. If we expand g(x) in powers of f (x), 


g(x) = > ef (x) [26] 
k 
then tbe coefficients c, are given by 


e =e Gf (x) foro — Q7] 
or, alternatively, by 


Cn = [x lg(x)f (x)f "^ (x) [28] 
Here, |x"]b(x) denotes the coefficient of x" in the 


power series b(x). 


With this theorem in hand, eqn [25] is easy to 
solve. We write it in the form 


T(x)exp(C- T(x)) =x [29] 


We want to know the coefficients in the expansion 
T(x) = S py tnx” /n!. Since, by [29], T(x) is the 
compositional inverse of x exp (—x), substitution of 
x exp (—x) instead of x gives 


x= Y 7 (xexp(—x))” 


n=0 


This equation is in the form [26] with f(x)= 
xexp (—x) and g(x) 2 x. Hence, by [27], we obtain 


and, thus, £, = 7". 
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The second method to solve functional equations 
which we explain in this section is the kernel 
method. We illustrate the method by an example. 
Let us consider the problem of counting Dyck paths 
of length 2» (see the section “Basic combinatorial 
terminology"). Rather than attempting to arrive at a 
solution of the problem directly, we consider the 
more general problem of counting the number a, ; 
of paths consisting of steps (1, 1) and (1, — 1), which 
start at the origin, never drop below y=0, have 
length z, and end at height k. We then form the 
bivariate generating function — F(u,x) — 55, 1.9 
a, yX"u*. We then have the functional equation _ 


F(u,x) = 1+ xuF(u, x) + = (F(u,x) — F(0,x)) [30] 


since a path can be empty (this explains the term 1), 
it can end by a step (1,1) (this explains the term 
xuF(u)), or it can end by a step (1,—1). The latter 
can only happen if the path before that last step did 
not end at height 0. The generating function for 
these paths is F(u,x) —F(0, x), and this explains the 
third term in the eqn [30]. In fact, we may replace 
[30] by 


F(u,x) = 1 + xuF(u,x) + ~ (F(u,x)— Fi(x)) [31] 


because [31] implies that Fi(x) = F(0, x). 

The idea of the kernel method is to get rid of the 
unknown series F(u, x). This is possible because F(u, x) 
occurs linearly in [31], which can be rewritten as 


F(u,x)(1 — xu -=) =] -=R (x) [32] 


We simply equate the coefficient of F(u, x) in this 
equation to zero, 


jaa ee 
u 


solve this for u, 


1— v1-4x? 
2x 


(the other solution for u makes no sense in [31]), 
and substitute this back in [32], to obtain 


1—v1—4x? 
2x? 


the familiar generating function [2] for the Catalan 
numbers. Now, by substituting this result in [31], we 
can even compute the full series F(u, x). 

While this was certainly a complicated, and 
unusual, way to compute the Catalan numbers, 
this approach generalizes when one considers 
paths with different step sets (see section VII.5 of 
the Flajolet and Sedgewick reference in "Further 


M. = 


Fi(x) 一 


reading" section). In a more general situation, one 
has a functional equation 


P(F(u, x), Fila) . ., Fox), x, s) = 0 [33] 


where F(u,x) appears linearly, as well as the 
unknown series Fi(x),..., F;(x), whereas x and u 
appear rationally. It is clear that one can apply the 
same technique, namely collecting all the terms 
involving F(u,x), equating the coefficient of F(u, x) 
to zero, solving for x and substituting back in [33]. If 
there is more than one function F;(x), then this will 
only give one equation for F;(x). However, when 
equating the coefficient of F(u,x), which was a 
polynomial equation, there can be more solutions. 
(That was actually also the case in our example, 
although only one solution could be used.) All these 
solutions can be substituted in [33] to give many 
more equations for F;(x). The kernel method will 
work if we have enough equations to determine the 
unknown functions F;(x) (see the Flajolet and 
Sedgewick reference, section VII.5 for further details). 
In the variant of the “obstinate kernel method,” 
more equations are produced in more sophisticated 
ways. The method has been largely extended by 
Bousquet-Mélou and co-workers to cover equations 
of the form [33], where P is a polynomial such that 
eqn [33] determines all involved series uniquely. This 
extension covers in particular the so-called quadratic 
method due to Brown, which is of great significance 
in the work of Tutte on the enumeration of maps. 
We refer the reader to Bousquet—Mélou and Jehanne 
(2005) and the references given there for these 
extensions. 


Extracting Asymptotic Information 
from Generating Functions 


There is powerful machinery available to extract the 
asymptotic behavior of the coefficients of a power 
series out of analytic properties of the power series. 
We describe the corresponding methods, singularity 
analysis and the saddle point method in this section. 
The survey by Odlyzko (1995) and the Flajolet and 
Sedgewick reference in “Further reading” are excel- 
lent sources for further reading, which, in particular, 
contain several other methods which we cannot 
cover here for reasons of limited space. 

Let us suppose that we are interested in the 
asymptotic behavior of the sequence (f;,),,>9 of real 
(or complex) numbers as n tends to infinity. Let us 
suppose that the power series f(z)—5, faz" 
converges in some neighborhood of the origin. (If 
this series converges only at z —0, then either one 
has to try to scale, that is, for example, look at the 


power series f(z) —»,, ofa&z' /n! instead, or one 
must apply methods other than singularity analysis 
or the saddle point method. In the latter case, 
depending on the nature of the coefficients fa, this 
may be the Euler-Maclaurin or the Poisson summa- 
tion formulas, the Mellin transform technique, or 
other direct methods. The reader is referred to 
Odlyzko (1995) and the Flajolet and Sedgewick 
reference.) The idea is then to consider f(z) as a 
complex function in z (and extend the range of f 
beyond the disk of convergence about the origin), 
and to study the singularities of f(z). (The point at 
infinity can also be a singularity.) Ths a is that 
the singularities of f(z) with smallest modulus 
dictate the asymptotic behavior of the coefficients 
fn. These singularities of smallest modulus are called 
the dominating singularities. 

If there is an infinite number of dominant 
singularities, then one has to try the circle method. 
We refer the reader to Andrews (1976) and Ayoub 
(1963) for details of this method. 

If there is a finite number of dominant singula- 
rities, then there can be again two different situa- 
tions, depending on whether these are “small” or 
“large” singularities. Roughly speaking, a singularity 
is small if the function f(z) grows at most 
polynomially when z approaches the singularity, 
otherwise it is “large.” A typical example of a small 
singularity is z—1/4 in (1 — 4a] whereas a 
typical example of a large singularity is z=oo in 
exp (x) or z=1 in exp(1/(1 — z)). 

The method to apply for small singularities is the 
method of singularity analysis as developed by 
Flajolet and Odlyzko. (Singularity analysis implies 
Darboux's method, which occurs frequently in the 
literature, and, thus, supersedes it.) For the sake of 
simplicity, we consider first only the case of a 
unique dominant singularity. We shall address the 
issue of several dominant singularities shortly. 
Furthermore, we assume the singularity to be 
z— 1, again for the sake of simplicity of presenta- 
tion. The general result can then be obtained by 
rescaling z. 

The basic idea is the transfer principle: 


If f(z) = a(z) -O(r(z) then 
fn = On  O(m) [34] 
where o(z) = 》 905,2" is a linear combination of 


tt 


standard functions of the form (1 — z) ", or loga- 
rithmic variants, and 7(z) — 55, 9 72” also lies in 
the scale (see sections VI.3,4 of the Flajolet and 
Sedgewick reference for the exact statement). The 
expansion for f(z) in [34] is called the singular 
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expansion of f(z). For the above-mentioned stan- 
dard functions, we have 


[7"](1— z) ^ (Flos n) 


no! Ci B 
7 Tr(a) (logn) () tH 1! logn 
C2 B ) 
二 = ome ge ) [35] 


where [z"]g(z) denotes the coefficient of z” in g(z), 
and where 

d* 1 

C, — Ta) SET (S) ra 


If a is a nonpositive integer, then this expansion has 
to be taken with care (cf. section VI.2 of the Flajolet 
and Sedgewick reference). 


Mii, see how this works, consider the example 
= yu o CE). We have 


S o — 1 
» hz “U-V Az 


The function on the right-hand side is meromorphic 
in all of C (where C denotes the complex numbers), 
with singularities at z=1 and z=1/4. The domi- 
nant singularity is z=1/4. We determine the 
singular expansion of f(z) about z= 1/4, 


f(z) =$ - 42)? - 51 — 49! 
A (1 — 4zy?? 4 o(a = 42) 


(We stopped the expansion after three terms. The 
farther we go, the more terms can we compute 
of the asymptotic expansion for f,. Hence, we 
obtain 


sfa ni? 1 1 
h=4 Gan (1 ~ Ba san) 
4 g 3? 


3 
9T(-1/2) ( +i) 
s3% 
SCENES T(—3/2) o(v??)) 
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If there are several small dominant singularities 
(but only a finite number of them), then one simply 
applies the above procedure for all of them and, to 
obtain the desired asymptotic expansion, one adds 
up the corresponding contributions. 
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The method to apply for large singularities is the 
saddle point method. For the following considera- 
tions, we assume that f(z) is analytic in |z| < R < oc. 
At the heart of the saddle point method lies 
Cauchy's formula 


1 
p= fa = 去 /dz Ba 
for writing the zth coefficient in the power series 
expansion of f(z). Here, C is some simple closed 
contour around the origin that stays in the range 
|z| < R. The idea is to exploit the fact that we are 
free to deform the contour. The aim is to choose a 
contour such that the main contribution to the 
integral in [36] comes from a very tiny part of the 
contour, whereas the contribution of the rest is 
negligible. This will be possible if we put the 
contour through a saddle point of the integrand 
f(z)/z"*!. Under suitable conditions, the main 
contribution will then come from the small passage 
of the path through the saddle point, and the 
contribution of the rest will be negligible. 

In practice, the saddle point method is not always 
straightforward to apply, but has to be adapted to the 
specific properties of the function f(z) that we are 
encountering. We refer the reader to the correspond- 
ing chapters in the Flajolet and Sedgewick reference 
and Odlyzko (1995) for more details. There is one 
important exception though, namely the Hayman 
admissible functions. We will not reproduce the 
definition of Hayman admissibility because it is 
cumbersome (cf. section VII.S in the Flajolet and 
Sedgewick reference and definition 12.4 of Odlyzko 
(1995)). However, in many applications, it is not 
even necessary to go back to it because of the closure 
properties of Hayman admissible functions. Namely, 
it is known (cf. Odlyzko (1995), theorem 12.8) that 
exp(p(z)) is Hayman admissible in |z| « oo for any 
polynomial p(z) with real coefficients as long as the 
coefficients a, of the Taylor series of exp (p(z)) are 
positive for all sufficiently large n (thus, e.g., exp (z) 
is Hayman admissible), and it is known that, if f(z) 
and g(z) are Hayman admissible in |z| < R € oc, then 
exp(f(z) and f(z)g(z) are also (thus, e.g. 
exp (exp (z) — 1) is Hayman admissible). . 

The central result of Hayman’s theory is the 
following: if f(z) — $7,.9 /,z" is Hayman admissible 
in |z| < R, then ~ 


f (ra) 


n ~Y ————— as 

f 1r? JJ 2nb(r,) 
where r, is the unique solution for large » of the 
equation a(r) 2 in (Ro, R), with a(r) — rf'(r)/f (r), 
b(r) =ra'(r), and a suitably chosen constant Ro > 0. 


n — oo [37] 


This result covers only the first term in the 
asymptotic expansion. There is an even more 
sophisticated theory due to Harris and Schoenfeld, 
which allows one to also find a complete asymptotic 
expansion. We refer the reader to section VIII.5 of 
the Flajolet and Sedgewick reference and Odlyzko 
(1995) for more details. 

Methods for the asymptotic analysis of multi- 
variable generating functions are also available 
(see the corresponding chapters in Flajolet and 
Sedgewick, Odlyzko (1995) and the recent impor- 
tant development surveyed in the Pemantle and 
Wilson reference listed in “Further reading"). We 
add that both the method of singularity analysis and 
Hayman's theory of admissible functions have been 
made largely automatic, and that this has been 
implemented in the Maple program gdev (see 
“Further reading"). 


The Theory of Heaps 


The theory of heaps, developed by Viennot, is a 
geometric rendering of the theory of the partial 
commutation monoid of Cartier and Foata, which 
is now most often called the Cartier-Foata monoid. 
Its importance stems from the fact that several 
objects which appear in statistical physics, such as 
Motzkin paths, animals, respectively polyominoes, 
or Lorentzian triangulations (see the Viennot and 
James reference in "Further reading" and the 
references therein), are in bijection with heaps. 

Informally, a heap is what we would imagine. We 
take a collection of “pieces,” say B1, B2,..., and put 
them one upon the other, sometimes also sideways, 
to form a “heap,” see Figure 6. 

There, we imagine that pieces can only move 
vertically, so that the heap in Figure 6 would indeed 
form a stable arrangement. Note that we allow 
several copies of a piece to appear in a heap. (This 
means that they differ only by a vertical translation.) 
For example, in Figure 6 there appear two copies of 
B2. Under these assumptions, there are pieces which 
can move past each other, and others which cannot. 
For example, in Figure 6, we can move the piece Bg 
higher up, thus moving it higher than B, if we wish. 
However, we cannot move B; higher than Beg, 


Figure 6 A heap of pieces. 


because Bg blocks the way. On the other hand, we 
can move B; past B, (thus taking Bg with us). Thus, 
a rigorous way to introduce heaps is by beginning 
with a set B of pieces (in our example, B= 
(B1, B2,...,B7}), and we declare which pieces can 
be moved past another and which cannot. We 
indicate this by a symmetric relation R: we write 
aRb to indicate that a cannot move past b (and vice 
versa). When we consider a word 2a;45...a, of 
pieces, a; € B, we think of it as putting first a1, then 
putting a2 on top of it (and, possibly, moving it past 
41), then putting a3 on top of what we already have, 
etc. We declare two words to be equivalent if one 
arises from the other by commuting adjacent letters 
which are not in relation. A heap is then an 
equivalence class of words under this equivalence 
relation. What we have described just now is indeed 
the original definition of Cartier and Foata. 

The class of heaps which occurs most frequently 
in applications is the class of heaps of monomers 
and dimers, which we now introduce. Let B= MU D, 
where M = {10, 11,...} is the set of monomers and 
D={d,,d2,...} is the set of dimers. We think of a 
monomer m; as a point, symbolized by a circle, 
with x-coordinate i, see Figure 7. We think 
of a dimer d; as two points, symbolized by circles, 
with x-coordinates ; — 1 and i which are connected 
by an edge, see Figure 7. We impose the relations 
m;Rm;, mj; Rd;, m;Rdj+1, j= 0, ; T d; Rd,, 1 一 1 < 
j € i, and extend R to a symmetric relation. Figure 8 
shows two heaps of momomers and dimers. 

For example, Motzkin paths are in bijection with 
heaps of monomers and dimers. To see this, given a 
Motzkin path, we read the steps of the path from 


Figure 7 Monomers and dimers. 


Lem. No bh | £ oi 9 || 
012594 5287 01238345 6 7 
Figure 8 Two heaps of monomers and dimers. 
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Figure 9 Bijection between Motzkin paths and heaps of 
monomers and dimers. 


the beginning to the end. Whenever we read a level- 
step at height h, we make it into a monomer with 
x-coordinate h, whenever we read a down-step from 
height b to height h — 1, we make it into a dimer 
whose endpoints have x-coordinates 5 — 1 and P. 
Up-steps are ignored. Figure 9 shows an example. In 
the figure, the heap is not in “standard” fashion, in 
the sense that the x-axis is not shown as a horizontal 
line but as a vertical line (cf. Figure 7). But it could 
be easily transformed into “standard” fashion by a 
simple reflection with respect to a line of slope 1. 

Lattice animals on the triangular lattice and on the 
quadratic lattice are also in bijection with heaps, this 
time with heaps consisting entirely out of dimers. 
Given an animal, one simply replaces each vertex of 
the animal by a dimer, see Figures 10 and 11. While 
in the case of animals on the triangular lattice this 
gives a constraintless bijection (see Figure 10), in the 
case of the quadratic lattice this sets up a bijection 
with heaps of dimers in which two dimers of the 
same type can never be placed directly one over the 
other (see Figure 11). For example, two dimers ds, 
one placed directly over the other (as they occur in 
Figure 10), are forbidden under this rule. 

Next we make heaps into a monoid by introdu- 
cing a composition of heaps. (A monoid is a set with 
a binary operation which is associative.) Intuitively, 
given two heaps Hi and H3, the composition of Hi 
and H2, the heap Hı o H3, is the heap which results 


012345678 


Figure 10 Bijection between animals and heaps of dimers. 


012345678 


Figure 11  Bijection between animals and heaps of dimers. 
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| 
| 9 $ | 
012345878 
Figure 12 The composition of the heaps in Figure 8. 


by putting H2 on top of Hı. In terms of words, the 
composition of two heaps is the equivalence class of 
the concatenation uw, where u is a word from the 
equivalence class of Hı, and w is a word from the 
equivalence class of H2. 

The composition of the two heaps in Figure 8 is 
shown in Figure 12. 

Given pieces B with relation R, let H(B, R) be the 
set of all heaps consisting of pieces from B, 
including the empty heap, the latter denoted by (). 
It is easy to see that the composition makes 
(H(B,R), o) into a monoid with unit (). 

For the statement of the main theorem in the 
theory of heaps, we need two more terms. A trivial 
heap is a heap consisting of pieces all of which are 
pairwise unrelated. Figure 13a shows a trivial heap 
consisting of monomers and dimers. A pyramid is a 
heap with exactly one maximal (= topmost) ele- 
ment. Figure 13a shows a pyramid consisting of 
monomers and dimers. Finally, if H is a heap, then 
we write |H| for the number of pieces in H. 

In applications, heaps will have weights, which are 
defined by introducing a weight w(B) for each piece B 
in B, and by extending the weight w to all heaps H by 
letting w(H) denote the product of all weights of the 
pieces in H (multiplicities of pieces included). 

Let M be a subset of the pieces B. Then, the 
generating function for all heaps with maximal 
pieces contained in M is given by 


2_TET(B\MR) (—1)'"'w(T) 
2 TET (BR) (—1)!"w(T) 


[38] 


where T(B,R) denotes the set of all trivial heaps 
with pieces from B. In particular, the generating 
function for all heaps is given by 


1 
Hj) eH —À 
E Frera (C71 wT) 


w(H) = 
HEH(B,R) 
maximal pieces CM 


[39] 
HEH(B,R) 


01234507 
(a) (b) 
Figure 13 (a) A trivial heap. (b) A pyramid. 


Furthermore, if P(B,R) denotes the set of all 
pyramids with pieces from B, then 


Pe€P(B,R) 


> w(H)| [40] 


HEH(B,R) 


where |P| is the number of pieces of P. (As the 
reader will have guessed, this is a consequence of the 
“exponential principle” mentioned in the section 
“generating functions.”) 


The Transfer Matrix Method 


The transfer matrix method (cf. Stanley (1986), 
chapter 4 for further reading) applies whenever we 
are able to build the combinatorial objects that we 
are interested in by moving on a finite number of 
states in a step-by-step fashion, where the current 
step does not depend on the previous ones. (In 
statistical language, we are considering a finite-state 
Markov chain.) For example, Motzkin paths which 
are constrained to stay between two parallel lines, 
say between y=0 and y=K, can be described in 
such a way: the states are the heights 0,1,...,K, 
and, if we are in state h, then in the next step we are 
allowed to move to states b+ 1, b, or h — 1, except 
that from state 0 we cannot move to —1 (there is no 
state —1), and when we are in state K we cannot 
move to K + 1 (there is no state K + 1). 

For describing the general situation, let G — (V, E) 
be a directed graph with vertex set V and edge set E. Let 
w,,(u,v) denote the number of walks from vertex u to 
vertex v along edges of G. To compute these numbers, 
we consider the adjacency matrix of G, A(G). By 
definition, using our notation, A(G) = (w1 (u, v)), vev- 
Obviously, (10,(u, v)), vey = (A(G))". Thus, 


(rust = Y AG) 
n=0 


UvEV n=0 
= (Ip — A(G)x) ! 


where JI, is the n x n identity matrix. In other words, 
" " OO n 

the generating functions $77 )w,(u,v)x" for the 

walk numbers between u and v form the entries of a 

matrix which is the inverse matrix of I,, — A(G)x. By 

elementary linear algebra, 


* wu, v) x" 
n=0 
E A v det(1, — A(G)x), ,, 41] 
det(I,, — A(G)x) 
where det (I, — A(G)x),,,, is the minor of I,, — A(G)x 
with the row indexed by v and the column indexed 


by u omitted, and where #u denotes the row 
number of u and similarly for #v. A weighted 
version could also be developed in the same way, 
where we put a weight w(e) on each edge, and the 
weight of a walk is the product of the weights of all 
its edges. 

In particular, the expression [41] is a rational 
function in x. Then, by the basic theorem on 
rational generating functions (cf. Stanley (1986), 
section 4.1), the number w,(u, v) can be expressed as 
a sum Y , P;(m)y", where the Ts are the different 
roots of the polynomial det (xI, — A(G)), and P;(z) 
is a polynomial of degree less than the multiplicity 
of the root y; (The P;(z)s depend on u and v, 
whereas the ^;'s do not.) If there exists a unique root 
y; with maximal modulus, then this implies that, 
asymptotically as 2 — oo, w(u, v) ~ Pj(n)y;. 


Lattice Paths 


Recall from the section on basic combinatorial 
terminology that a lattice path P in Z^ is a path in 
the d-dimensional integer lattice Z^ which uses only 
points of the lattice, that is, it is a sequence 
(Po, P1, ..., Pj), where P; € Z for all i. The vectors 
一 一 一 一 一 一 一 一 一 一 

P0P1, P4P5, ..., Pj 4P, are called the steps of P. The 
number of steps, /, is called the length of P. 

The enumeration of lattice paths has always 
been an intensively studied topic in statistics, 
because of their importance in the study of 
random walks, of rank order statistics for non- 
parametric testing, and of queueing processes. The 
reader is referred to Feller (1957) and particularly 
Mohanty's (1979) book, which is a rich source for 
enumerative results on lattice paths, albeit in a 
statistical language. We review the most important 
results in this section. Most of these concern two- 
dimensional lattice paths, that is, the case d — 2. 

To begin with, we consider paths in the integer 
plane Z^ consisting of horizontal and vertical unit 
steps in the positive direction. Clearly, the number 
of all (unrestricted) paths from the origin to (n, m) is 
the binomial coefficient ("7"). By the reflection 
principle, which is commonly attributed to D André 
(see, e.g., Comtet (1974) p. 22), it follows that the 
number of paths from the origin to (n,m) which do 
not pass above the line y =x + t, where m < n + t, is 


given by 
ao E A ) 42] 
n n+t+1 
Roughly, the reflection principle sets up a bijec- 


tion between the paths from the origin to (n,m) 
which do pass above the line y — x + t and all paths 
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from (—t — 1,£ + 1) to (n,m), by reflecting the path 
portion between the origin and the last touching 
point on y=x+t+1 in this latter line. Thus, the 
result of the enumeration problem is the number of 
all paths from (0, 0) to (n,m), which is given by the 
binomial coefficient [om minus the number of all 
paths from (—£ — 1,£ + 1) to (n,m), which is given 
by the binomial coefficient (^77), 
formula [42]. 

If one considers more generally paths bounded by 
the line my — nx +t, no compact formula is known. 
It seems that the most conceptual way to approach 
this problem is through the so-called kernel method 
(see the section on solving equations for generating 
functions), which, in combination with the saddle 
point method, allows one also to obtain strong 
asymptotic results. There is one special instance, 
however, which has a “nice” formula. The number 
of all lattice paths from the origin to (n,m) which 
never pass above x= puy, where yp is a positive 
integer, is given by 


whence the 


n+m+1 m d 


n — pum --1 pz 
The most elegant way to prove this formula is by 
means of the cycle lemma of Dvoretzky and 
Motzkin (see Mohanty (1979), p. 9 where the cycle 
lemma occurs under the name of “penetrating 
analysis"). 

Iteration of the reflection principle shows that the 
number of paths from the origin to (n,m) which stay 
between the lines y=x+t and y—x-s (being 
allowed to touch them), where t > 0 > sand n - t > 
m »n--s, is given by the finite (!) sum (see, e.g., 
Mohanty (1979), p. 6) 


Efla 一 十 


站 (, — k(t pir +t+ Mi 44] 


The enumeration of lattice paths restricted to 
regions bounded by hyperplanes has also been 
considered for other regions, such as quadrants, 
octants, and rectangles, as well as in higher dimen- 
sions. A general result due to Gessel and Zeilberger, 
and Biane, independently, on the number of lattice 
paths in a chamber (alcove) of an (affine) reflection 
group (see Krattenthaler (2003) for the correspond- 
ing references and pointers to further results) shows 
how far one can go when one uses the reflection 
principle. In particular, this result covers [42] and 
[44], the enumeration of lattice paths in quadrants, 
octants, rectangles, and many other results that have 
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appeared (before and after) in the literature. We 
present a particularly elegant (and frequently occur- 
ring) special case. (In reflection group language, it 
corresponds to the reflection group of “type A, 4." 
See Humphreys (1990) for terminology and infor- 
mation on reflection groups.) 

Let A = (@j, @3,....5@q) and E = (e1,6e5,...,&4) be 
points in 74 with a, >a,>--->ay and e> 
e2 >--- > eg. The number of all paths from A to E in 
the integer lattice Z^, which consist of positive unit 
steps and which stay in the region x1 > x? > -+ > xq, 
equals 


3 1 
zi yi 4 eS NE CNN 
(Se D l Se 一 太一 ; m) i 

The counting problem of the theorem is equiva- 
lent to numerous other counting problems. It has 
been originally formulated as an 7-candidate ballot 
problem, but it is as well equivalent to counting the 
number of standard Young tableaux of a given 
shape. In the case that all aps are equal, the 
determinant does in fact evaluate into a closed- 
form product. In Young tableaux theory, a parti- 
cular way to write the result is known as the 
hook-length formula (see, e.g., Stanley (1999), 
corollary 7.21.6). 

We return to lattice paths in the plane, mention- 
ing some more closely related results. The first is a 
result of Mohanty (1979, section 4.2), which 
expresses the number of all lattice paths from the 
origin to (n,m) which touch the line y=x +t 
exactly r times, never crossing it, as the difference 


vius La! ;x1 i46] 
n+t—1 n+t 

Not forbidding that the paths cross the bounding 
line, we arrive at the problem of counting the lattice 
paths from the origin to (n, m), which cross the main 
diagonal y — x exactly r times, the answer being 

m- —n--2r--1/(m--n-41 
m-4n--1 


2r - 2 2n 
n n—r—1l 


Next, we give the number of lattice paths from the 
origin to (n, n) which have 2r steps on one side of 


the line y =x, as 
(7) ie 一 48] 
r n—r 


a result due to Sparre Andersen. We refer the reader 
to Mohanty (1979, chapter 3) for further results in 
this direction. 
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Enumerating lattice paths with a fixed number 
of maximal straight pieces (which correspond to 
runs), is intimately connected to another basic 
enumeration problem concerning lattice paths: the 
enumeration of lattice paths having a fixed number 
of turns. An effective way to attack the latter problem 
is by means of two-rowed arrays (see the survey 
article by Krattenthaler (1997), where in particular 
analogs of the reflection principle for two-rowed 
arrays are developed. These imply formulas for the 
number of lattice paths with fixed starting points and 
endpoints and a fixed number of north-east (respec- 
tively east-north) turns, for unrestricted paths, as 
well as for paths bounded by lines. (A north—east turn 
in a lattice path is a point where the direction changes 
from “north” to “east.” An east-north turn is defined 
analogously.) In particular, analogs of [42]-[44] are 
known when the number of north-east (respectively 
east-north) turns is fixed. 

These formulas imply for example (see again 
Krattenthaler (1997, section 3.5)) that the number 
of lattice paths from the origin to (n,n) which 
never pass above the line y=x+¢t and have 
exactly 2r maximal straight pieces is given by 


LU Ed ep po 
aee deca ds. M 


with a similar result for the case of 2r + 1 maximal 
straight pieces. (If t= 0, the numbers in [49] become 


lyn n 

n ( r ) s —1 
and they are known as the Narayana numbers.) 
Furthermore, they imply that the number of lattice 
paths from the origin to (n,n) which never pass 
above the line y —x -- £ and never below the line 
y—-x-t and have exactly 2r maximal straight 
pieces is given by 
Y tj erat J Par ` 

r+k-1 r—k—1 


k——oo 
P n—2kt -t — 1X /n--2kt —t—1 
r+k—2 r—k 
n—2kt--t — 1X (a+2kt-—t-—1 
-( r--E—1 X r—h—1 )} po 
with a similar result for the case of 2r+ 1 maximal 
straight pieces. 


The most general boundary for lattice paths that 
one can imagine is the restriction that it stays 


between two given (fixed) paths. Let us assume that 
the horizontal steps of the upper (fixed) path are at 
heights a; < a2 €::- € dn, whereas the horizontal 
steps of the lower (fixed) path are at heights bı < 
bx. xb,a;»bj;i-—1,2,...,". Then the num- 
ber of all paths from (0,54) to (71,4,) satisfying the 
property that for all /— 1,2,...,7 the height of the 
ith horizontal step is between b; and a; is given by 
the determinant 


det Ty 3) [51] 
1<ij<n 7 一 1 十 1 


In the statistical literature, this formula is often 
known as “Steck’s formula," but it is actually a 
special case of a much more general theorem due 
to Kreweras. A generalization of [51] to higher- 
dimensional paths was given by Handa and 
Mohanty (see Mohanty (1979, section 2.4)). 

Next, we consider three-step lattice paths in the 
integer plane Z?, that is, paths consisting of ap steps 
(1,1), level de (1,0), and down-steps (1, —1). The 
particular problem that we are eined in is to 
count such three-step paths starting at (0,7) and 
ending at (£,s), which do not pass below the x-axis 
and do not pass above the horizontal line y — K. 
Furthermore, we assign the weight 1 to an up-step, 
the weight b, to a level-step at height 5b, and the 
weight A, to a down-step from height b to 5 — 1. 
The weight w(P) of a path P is defined as the 
product of the weights of all its steps. Then we have 
the following result, which can be obtained by the 
transfer matrix method described in the last section. 

Define the sequence (p,,(x)),>9 of polynomials by 


XPn(x) = pgg (2) + bnpn(x) + AnPn—1(x) [52] 
forn>1 
with initial conditions po(x) — 1 and pi(x) — x — bo. 


Furthermore, define (Sp, (x)),5o to be the sequence of 
polynomials which arises from the sequence (p,,(x)) 
by replacing A; by Aj,4 and b; by bj,1,1— 0,1,2,..., 
everywhere in the three-term recurrence [52] and in 
the initial conditions. Finally, given a polynomial p(x) 
of degree », we denote the corresponding reciprocal 
polynomial x"p(1/x) by p*(x). 

With the weight w defined as before, the generat- 
ing function 37, w(P)x', where the sum is over all 
three-step paths which start at (0, 7), terminate at 
height s, do not pass below the x-axis, and do not 
pass above the line y — K, is given by 


x* pr(x)S pr .(x) 
Pki (x) l 
T—S ap * r+ * 
A, 23 A x ps (x)S Pk (X) 


* Asl ; E 
Dia (X) 
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The sequence of polynomials (p,,(x)),,+9 is in fact a 
sequence of orthogonal polynomials (cf. Koekoek 
and Swarttouw (1998) and Szegó (1959)). 

We remark that in the case that r=s=0 there is 
also an elegant expression for the generating func- 
tion due to Flajolet (see section V.2 of the Flajolet 
and Sedgewick reference in “Further reading") in 
terms of a continued fraction. 

In order to solve our problem, we just have to 
extract the coefficient of x’ in [53]. By a partial 
fraction expansion, a formula of the type 


cendi. [54] 


results, where the £,,'s are the zeroes of py,1(x), and 
the cm’s are some coefficients, only a finite number 
of them being nonzero. 

It should be noted that, because of the many 
available parameters (the b,’s and A,'s), by appro- 
priate specializations one can also obtain numerous 
results about enumerating three-step paths accord- 
ing to various statistics, such as the number of 
touchings on the bounding lines, etc. 

There are two important special cases in which a 
completely explicit solution in terms of elementary 
functions can be given. 

The first case occurs for b; — 0 and A; — 1 for all 7. 
In this case, the polynomials p,(x) defined by 
the three-term recurrence [52] are PA de poly- 
nomials of the second kind, pn(x)= U,(x/2). 
(Ihe Chebyshev polynomial d the am. kind 
U,(x) is defined by U,,(cost)= sin ((n + 1)2)/ sint 
(see Koekoek and Swarttouw (1998) for almost 
exhaustive information on these polynomials and, 
more generally, on hypergeometric orthogonal poly- 
nomials)). The result which is then obtained from the 
general theorem (clearly, the zeros of U,(x) are 
x= cos (2kr/(n+ 1),k —1,2,...,", and therefore 
the partial fraction expansion of [53] is easily 
determined) is that the number of lattice paths from 
(0,r) to (£,s) with only up- and down-steps, which 
always stay between the x-axis and the line y — K, is 
given by (see also Feller (1957, chapter XIV, eqn [5.7]) 


3 K+1 rk l4 
K+24 (2o 3) 
pm 1) . mk(s- 1) 
x sin sin K 9 [$5] 


a formula which goes back to Lagrange. 

The second case occurs for b;— 1 and A;— 1 for 
all i. In this case, the polynomials p,(x) defined 
by the three-term recurrence [52] are again 
Chebyshev polynomials of the second kind, 
Pn(x)=U,((x —1)/2). The result which is then 
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obtained from the general theorem is that the 
number of three-step lattice paths from (0,r) to 
(£, s), which always stay between the x-axis and the 
line y= K, is given by 


2 K+1 t 
— 1 
K322- (2c 7+ ) 
nk(r--1) . mk(s+ 1) 56] 
K+2 K+2 


Perfect Matchings and Tilings 


In this section we consider the problem of counting 
the perfect matchings of a graph. For an introduc- 
tion into the problem, and into methods to solve it, 
as well as for a report on recent developments, we 
refer the reader to Propp (1999). 

Let G=(V,E) be a finite loopless graph with 
vertex set V and edge set E. A matching (also called 
1-factor in graph theory) is a subset of the edges 
with the property that no two edges share a vertex. 
A matching is perfect if it covers all the edges. 
Let M(G) denote the number of perfect matchings of 
the graph G. More generally, we could assign a 
weight w(e) to each edge e of the graph and define the 
weight of a matching to be the product of 
the weights of all its edges. Let M,,(G) denote 
the sum of all weights of all matchings of the 
graph G. 

Kasteleyn's method for determining M(G), respec- 
tively M,(G), makes use of determinants and 


Pfaffians. Recall that the Pfaffian Pf(A) of a 
triangular array A = (aij)1<icj<2n is defined by 
Pf(A) — X (sgn m) II Qij [57] 
m {ij}Em 


where the sum is over all perfect matchings of the 
complete graph on vertices {1,2,...,2n}, and where 
the product is over all edges {i,j},i<j, of m. The 
sign sgn m of m is (—1)**999n85 of . where a crossing 
is a pair ((5,7), (5, ]]) of edges such that ; « & <j « I. 
Usually, one extends the triangular array A to a 
matrix by setting a;;— —aj,j,i<j, and a;,;=0 for 
all i. Then, abusing notation, we identify the 
triangular array with the skew-symmetric matrix 
A = (di,j)1<i,j<2n- The Pfaffian satisfies the following 
useful properties: 


Pf(B'AB) = det(B) Pf(A) 
and 


Pf(4) = det(A) [58] 


The latter equality shows in particular that Pfaffians 
are very close to determinants. They do, in fact, 
generalize determinants since 


pf & 3 = det B (59] 


for any square matrix B. 

Thus, given a graph with vertices v4,v5,..., Vn 
specializing a; ; to the weight of the edge between v; 
and v;, if it exists, and setting a;;=0 otherwise in 
the definition of the Pfaffian, we obtain almost 
M,,(G), the only difference is that there could be 
signs in front of the individual terms of the sum, 
whereas in M,,(G) the sign in front of each term 
must be 十 . (The object obtained by omitting the sign 
in [57] is called Hafnian. Unfortunately, in contrast 
to the Pfaffian, it does not have any nice properties 
and it is therefore extremely difficult to compute.) 
Kasteleyn's idea is to circumvent this problem by 
orienting the edges of the graph, defining signed 
weights of the edges, in such a way that the Pfaffian 
of the array with signed weights produces exactly 
M,,(G). 

More precisely, given a (weighted) graph G with 
vertices V1,V2,...,V2n, We make it into an oriented 
(weighted) graph G. That is, if there is an edge 
between v; and vj, e; į; say, we orient it either from v; 
to v; or the other way. Now we define the signed 
adjacency matrix A(G) of G by letting its (7, /)-entry 
to be +w(e;;j) if there is an edge from v; to v; 
oriented that way, —w(e;,;) if there is an edge from 
v; to v; oriented that way, and 0 if there is no edge 
between v; and v;. Such an orientation is called 
Pfaffian if 


Pf(A(G)) = +M,,(G) 


Clearly, the question remains whether a Pfaffian 
orientation can be found for a given graph. In 
general, this is an open question. However, Kaste- 
leyn shows that for planar graphs such a Pfaffian 
orientation can always be found. Moreover, he 
shows that any orientation of a planar graph 
which has the property that around any face 
bounded by 4k edges an odd number of edges is 
oriented in either direction and that around any face 
bounded by 4k + 2 edges an even number of edges is 
oriented in either direction is Pfaffian. 

For bipartite graphs (i.e., for graphs in which the set 
of vertices can be split into two disjoint sets such that 
all the edges connect the vertex of one of these sets to a 
vertex of the other), the situation is even nicer. This is 
because for a bipartite graph G in which both parts of 
the bipartition of the vertices are of the same size 
(otherwise, there is no perfect matching), any signed 


adjacency matrix A(G) has the block form of the 
matrix on the left-hand side of [59] and, hence, the 
Pfaffian reduces to a determinant. More precisely, let 
G be a bipartite graph with vertex set V — UU W, 
U = (u1,u5,..., u,] and W= (w1,12,...,1,]), with 
edges connecting some u; to some wj. Given a 
Pfaffian orientation G, we build the signed bipartite 
adjacency matrix B(G) — (bi;)4-;;-, of G by setting 
bi ; = +w/(e;,;) if there is an edge from u; to wj; oriented 
that way, —10(e;;) if there is an edge from u; to w; 
oriented that way, and 0 if there is no edge between u; 
and w;. Then we have 


det(B(G)) = +M,,(G) 


In particular, this holds for any bipartite planar 
graph. See Robertson et al. (1999) for a structural 
description about which (not necessarily planar) 
bipartite graphs admit a Pfaffian orientation. 

Kasteleyn's construction in the planar case has 
been generalized to graphs on surfaces of any genus 
g in Dolbilin et al. (1996), Galluccio and Loebl 
(1999), and Tesler (2000), independently. As pre- 
dicted by Kasteleyn, the solution is in terms of a 
linear combination of 44 Pfaffians. 

With the help of his method, Kasteleyn computed 
the number of dimer coverings of an mxn 
rectangle. (A dimer is a 2 x 1 rectangle. Thus, this 
is equivalent to counting the number of perfect 
matchings on the m x grid graph. The formula 
was independently found by Temperley and Fisher.) 
The result is 


For even m and n, the formula can be rewritten as 
m/2 n/2 

Ti 
LL (eos 
= m+i 


There is a similar rewriting if one of m or n is odd. 
(If both m and n are odd, there is no dimer 
covering.) 

For further reading and references see Dimer 
Problems and Kuperberg (1998). 


Nonintersecting Paths 


Let G—(V,E) be a directed acyclic graph with 
vertices V and directed edges E. Furthermore, we are 
given a function w which assigns a weight w(x) to 
every vertex or edge x. Let us define the weight w(P) 
of a walk P in the graph by [[, w(e) [| [, w(v), where 
the first product is over all edges e of the walk P and 
the second product is over all vertices v of P. We 
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denote the set of all walks in G from u to v by 
P(u — v), and the set of all families (P1, P5, ..., Pn) 
of walks, where P; runs from u; to vi;,i= 1,2,...,7, 
by P(u—v) with u-—(u1,u5,...,4,) and v= (v1, 
U5,..., Ug). The symbol P*(u — v) stands for the set 
of all families (P4, P5,..., P,) in P(u— v) with the 
additional property that no two walks share a 
vertex. We call such families of walk(er)s “vicious 
walkers" or, alternatively, *nonintersecting paths." 
The weight w(P) of a family P —(P4,P5,...,P,) of 
walks is defined as the product [[;_1 w(P;) of all the 
weights of the walks in the family. Finally, given a 
set M with weight function w, we write GF(M;w) 
for the generating function 5 .em W(x). 

We need two further notations before we are able 
to state the Lindstróm-Gessel-Viennot theorem. 
(For references and historical remarks, we refer the 
reader to footnote 5 in Krattenthaler (2005a).) As 
earlier, the symbol G, denotes the symmetric group 
of order n. Given a permutation o € O,, we write u, 
for (451), 45(2) ++ +5 Uoln)). Then 


> (sen e) - GF(P*(u, — v); w) 
c€6, 


= det (GF(P(u; — vi); w)) [60] 


l<ij<n 


Most often, this theorem is applied in the case 
where the only permutation o for which vicious 
walks exist is the identity permutation, so that the 
sum on the left-hand side reduces to a single term 
that counts all families (P4, P5,...,P,) of vicious 
walks, the ith walk P; running from A; to 
Ej,1— 1,2,...,5. This case occurs, for example, if 
for any pair of walks (P, Q) with P running from ua 
to vg and O running from up to va a < b and c « d, 
it is true that P and O must have a common vertex. 
Explicitly, in that case we have 


GF(P* (u—v);w)-— det (GF(P(u; —5vi);w)) [61] 


1 £ij n 


If the starting points or/and the endpoints are not 
fixed, then the corresponding number is given by a 
Pfaffian, a result obtained by Okada and Stembridge 
(see Bressoud (1999) for references). For a set .A of 
starting points, let P (.A— v) denote the set of all 
families (P41,P5,...,P5,) of nonintersecting lattice 
paths, where P; runs from some point of A to 
Vj,1— 1,2,...,2n. Furthermore, let us suppose that 
the elements of A= [u1,15,...] are ordered in such a 
way that for any pair of walks (P,Q) with P running 
from u, to vg and O running from u, to ve, a < b and 
c « d, it is true that P and O must have a common 
vertex. (This is the same condition as the one which 
makes [61] valid, with the only difference that, here, 
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the number of wu;’s could be larger than the number of 
vj's.) Then, 


GF(P*(.A — v);w) 


= Pf (Y (GF(P(u, — v;);w)GF(P(up > vj); w) 


? deije2n 
=" a<b 
—GF(P(u, — vi);w)GF(P(u; — v;);w))) [62] 


If the number of paths is odd, then one can use the 
same formula by adding an artificial point to the 
endpoints and to the set of starting points A. There 
is also a theorem by Okada and Stembridge which 
covers the case that starting points and endpoints 
vary. Refinements when the number of turns is fixed 
can be found in Krattenthaler (1997). 


Vicious Walkers, Plane Partitions, 
Rhombus Tilings, and Fully Packed 
Loop Configurations 


In this section we describe the interrelations between 
four frequently appearing objects in statistical 
mechanics and combinatorics: vicious walkers, 
plane partitions, rhombus tilings, and fully packed 
loop configurations. 

Given a lattice, vicious walkers, as introduced by 
Fisher (1984), are particles which move on lattice 
sites in such a way that two particles never occupy 
the same lattice site. Models of vicious walkers have 
been the object of numerous studies from various 
points of view. Rather than accomplishing the 
impossible task of providing a complete overview 
of references, the reader is referred to the basic 
reference Fisher (1984) and to Krattenthaler (20052) 
for further pointers to the literature. 

Most of the known results apply for vicious 
walkers on the line. There are in fact two different 
models: in the random turns vicious walker model, n 
walkers move on the integral points of the real line 
in such a way that at each tick of the clock exactly 
one walker moves to the right or to the left, whereas 
in the lock step vicious walker model n walkers 
move on the integral points of the real line in such a 
way that at each tick of the clock each walker moves 
to the right or to the left. 

The first model is equivalent to a model of one 
walker in Z” (Z denoting the set of integers) which 
at each tick of the clock moves a positive or negative 
unit step in the direction of one of the coordinate 
axes, always staying in the wedge xı > x? »---» 
Xn. This point of view was already put forward by 
Fisher (1984). However, this problem belongs to the 
problem of counting paths in chambers of reflection 
groups discussed in the section “Lattice paths.” 


The second model could also be realized as a 
single walker model (cf. Krattenthaler (2003)). 
However, most often it is realized as a model of n 
paths in the plane consisting of steps (1,1) and 
(1, —1) with the property that no two paths have a 
point in common. In this picture, the x-axis becomes 
the time line, the kth path doing an up-step (1, 1) 
from (t — 1,y) to (t£, y 4- 1) meaning that the kth 
particle moves to the left at time t, whereas the kth 
path doing a down-step (1, —1) from (£ — 1,y) to 
(£, y — 1) meaning that the kth particle moves to the 
right at time t. 

The reader should consult Figure 14a for an 
example. (The labelings should be ignored at this 
point. Clearly, what. we encounter here is a 
particular instance of the nonintersecting paths of 
the last section. Therefore, for fixed starting points 
and endpoints, formula [61] applies, whereas if the 
starting points vary and the endpoints are fixed, it is 
formula [62] that applies. 

At this point, the links to the other objects, 
semistandard tableaux and plane partitions 
(cf. Bressoud (1999)), emerge. A filling of the cells 
of the Ferrers diagram of A with elements of the set 
{1,2,...}, which is weakly increasing along rows 
and strictly increasing along columns is called a 
(semistandard) tableau of shape A. Figure 14b shows 
such a semistandard tableau of shape (4,3,2). In 
fact, vicious walkers and semistandard tableaux are 
equivalent objects. To see this, first label down-steps 
by the x-coordinate of their endpoint, so that a step 
from (a — 1,5) to (a,b — 1) is labeled by a, see 
Figure 14a. Then, out of the labels of the jth path, 
form the jth column of the corresponding tableau, 


(a) (b) 


Figure 14 (a) Vicious walkers. (b) A tableau. 


see Figure 14b. The resulting array of numbers is 
indeed a semistandard tableau. This can be readily 
seen, since the entries are trivially strictly increasing 
along columns, and they are weakly increasing along 
rows because the paths do not touch each other. 
Thus, problems of enumerating vicious walkers can 
be translated into tableau enumeration problems, 
and vice versa. 

The significance of semistandard tableaux lies 
particularly in the representation theory for classical 
groups, see Classical Groups and Homogenous 
Spaces and Compact Groups and Their Representa- 
tions. Namely, the irreducible characters for 
GL(n,C) and SL(z,C), the Schur functions, are 
generating functions for semistandard tableaux of 
a given shape. If the entries of the ith row of 
a semistandard tableau are required to be at least 
2i — 1, then one speaks of symplectic tableaux, and 
the irreducible characters for Sp(2z, C) are generat- 
ing functions for symplectic tableaux of a given 
shape. We refer the reader to Krattenthaler et al. 
(2000) for more information on these topics. 

Objects which are very close to semistandard 
tableaux are plane partitions. According to MacMa- 
hon, a plane partition of shape A is a filling of the 
Ferrers diagram of 入 with non-negative integers which 
is weakly decreasing along rows and columns. See 
Figure 15b for an example of a plane partition of shape 
(3,3,3). In particular, semistandard tableaux and 
plane partitions of rectangular shape are actually 
equivalent. For, let T be a semistandard tableau of 
rectangular shape. Then, from each element of the 7th 
row we subtract i. Finally, the obtained array is rotated 
by 180°. As a result, we obtain a plane partition. See 
Figure 15 for a semistandard tableau and a plane 
partition which correspond to each other under these 
transformations. 

On the other hand, plane partitions can also be 
realized as three-dimensional objects, by interpreting 
each entry in the array as a pile of unit cubes of the 
size of the entry. For example, the plane partition in 
Figure 15 corresponds to the pile of cubes in 
Figure 16a. But then, forgetting the three-dimensional 
view, by embedding the picture in a minimally 
bounding hexagon, and by filling the emerging empty 
regions by rhombi of unit length in the unique way this 
is possible, we obtain a rhombus tiling of a hexagon in 


Figure 15 (a) A semistandard tableau. (b) A plane partition. 
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(a) (b) 
Figure 16 (a) A plane partition; three-dimensional view. 
(b) A rhombus tiling. 


which opposite sides have the same length, see 
Figure 16b. 

From the rhombus tiling, there is then again an 
elegant way to go to nonintersecting paths: we mark 
the mid-points of the edges along two opposite sides, 
see Figure 17a. Now we draw lattice paths which 
connect points on different sides, by “following” 
along the other lozenges, as indicated in Figure 17a 
by the dashed lines. Clearly, the resulting paths are 
nonintersecting, that is, no two paths have a 
common vertex. If we slightly distort the underlying 
lattice, we get orthogonal paths with horizontal and 
vertical steps in the positive direction, see 
Figure 17b. 

Rhombus tilings, on their part, are equivalent to 
perfect matchings of hexagonal graphs. To see this, 
one places the tiling on the underlying triangular 
grid, see Figure 18a. Then one places a bond into 
each rhombus, so that it connects the mid-points of 
the two triangles out of which the rhombus is 
composed, see Figure 18b. Finally, one forgets the 
contour of the tiling, but instead one introduces all 
the other edges which connect mid-points of 
adjacent triangles of the triangular grid, see 
Figure 18c. Thus, one arrives at a perfect matching 
of the hexagonal graph consisting of the edges 
connecting mid-points of triangles. 

Because of these various connections, enumera- 
tion problems for vicious walkers, plane partitions, 
tableaux, rhombus tilings can be approached by 
the different methods which are available for the 
various objects: the determinant theorem from 
the section *Nonintersecting paths,” together 
with determinant evaluation techniques (cf. the 
survey Krattenthaler (2005b)), apply, as well as the 
“Kasteleyn method" from the section “Perfect 


(a) (b) 
Figure 17 (a) A rhombus tiling. (b) A family of nonintersecting 
paths. 
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(a) (b) 


(c) 
Figure 18 (a) A rhombus tiling. (b) Bonds in rhombi. 
(c) A perfect matching of a hexagonal graph. 


matchings and tilings,” and also methods from 
character theory for the classical groups. All 
of these methods have been applied extensively (see 
the surveys by Kenyon (2003), Propp (1999), and 
Krattenthaler e£ al. (2000)), the first and third more 
frequently for exact enumeration, while the second 
particularly for asymptotic studies. It should be 
noted that methods from random matrix theory also 
apply in certain situations, see Johansson (2002). See 
Growth Processes in Random Matrix Theory and 
Random Matrix Theory in Physics. 

In fact, we missed mentioning a further object, from 
statistical physics, which in some cases is equivalent to 
vicious walkers, etc.: fully packed loop configurations. 
(Fully packed loop configurations are in bijection with 
six-vertex configurations, see the next section.) If one 
imposes certain “connectivity constraints" on fully 
packed loop configurations, then one can construct 
bijections with rhombus tilings and, hence, with 
nonintersecting paths and with the other objects 
discussed in this section. The reader is referred to 
Di Francesco et al. (2004) and references therein. 

Having explained the various connections, we cite 
some fundamental results in the area. (We refer the 
reader to Bressoud (1999) and Stanley (1999, 
chapter 7).) MacMahon proved that the number of 
all plane partitions contained in an a x b x c box 
(when viewed in three dimensions) is equal to 


4a b € 3.5 
1 十 1 十 R 一 1 
i+j+k—2 (63) 


Æl j=l k=1 


Thus, the number of rhombus tilings of a hexagon 
with side lengths a,b,c,a,b,c is given by the same 
number, as well as the number of all vicious walkers 
(P4, P5, ..., P4), where Pi runs from (0, 2i) to (b +c, 
b —c--2i,1—1,2,...,a. More generally, the num- 
ber of semistandard tableaux of shape A with entries 
at most m is given by the hook-content formula 


c(u) +m 
ll^. e [64] 


where z ranges over all the cells of the Ferrers 
diagram of A, with c(u) being the content of u, 
defined as the difference of the column number and 
the row number of wu, and with h(u) being the hook 
length of u, defined as the number of cells in the 
hook of u, the latter consisting of the cells to the 
right of u in the same row and below u in the 
same column, including u. Thus, this also gives a 
formula for the number of all vicious walkers 
(P4,P5,...,P4;), where P; runs from (0,2%) to 
(N,h;). See Krattenthaler et al. (2000, section 2) 
for details. There it is also explained that a Schur 
function summation formula, together with an 
analog of the hook-content formula for special 
orthogonal characters, proves that the number of 
all vicious walkers (P4, P5,..., P;), where P; runs 
from (0, 27) for N steps is given by 


4 十 ! 十 1 一 1 
一 一 一 一 一 一 6 
1 十 1 一 1] 


1<i<j<N 


The reader is referred to the references given in 
this section for many more results, in particular, on 
the enumeration of plane partitions with symmetry, 
the enumeration of rhombus tilings of regions other 
than hexagons, and the enumeration of vicious 
walkers with various starting points and endpoints, 
under various constraints. 


Six-Vertex Model and Alternating-Sign 
Matrices 


An alternating-sign matrix is a square matrix of O's, 
1’s and —1’s for which the sum of entries in each 
row and in each column is 1 and the nonzero entries 
of each row and of each column alternate in sign. 
For instance, 


coo oo K © 
oor OO © 
| 
€ 
oor © © 


is a 5 x 5 alternating-sign matrix. Zeilberger proved 
that the number of n x n alternating-sign matrices is 
given by 


[ro 66) 


and he went on to prove the finer version that the 
number of n x n alternating-sign matrices with the 
(unique) 1 in the first row in position / is given by 


(+ o TD) m (3i+ 1)! 
etre xo ur i) 


The first number is also equal to the number of 
totally symmetric self-complementary plane parti- 
tions contained in the (2n) x (2m) x (2n) box, but 
there is no intrinsic explanation why this is so. We 
refer the reader to Bressoud (1999) for an exposi- 
tion of these results, and for pointers to the 
literature containing further unexplained connec- 
tions between alternating-sign matrices and plane 
partitions. 

While the first result was achieved by a brute-force 
constant-term approach, the second result is based on 
the observation that alternating-sign matrices are in 
bijection with configurations in the six-vertex model 
on the square grid under domain-wall boundary 
conditions. This then allowed one to use a formula 
due to Izergin for the partition function for these six- 
vertex configurations. Similar formulas for variations 
of the model have been found by Kuperberg, and by 
Razumov and Stroganov (see Razumov and Stroga- 
nov (2005) and references therein). 

A configuration in the six-vertex model is an 
orientation of edges of a 4-regular graph (i.e., at 
each vertex there meet exactly four edges) such that 
at each vertex two edges are oriented towards the 
vertex and two are oriented away from the vertex. 
Thus, there are six possible vertex configurations, 
giving the name of the model, see Figure 19. To go 
from one object to the other, one uses the transla- 
tion between local configurations at a vertex and 
entries in alternating-sign matrices indicated in the 
figure. An example of the correspondence can be 
found in Figure 20. 

Another manifestation of alternating-sign matrices 
and six-vertex configurations are fully packed loop 
configurations. A fully packed loop configuration on a 
graph is a collection of edges such that each vertex is 


Tiri 


Figure 19 The six vertex configurations. 


[67] 
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(a) (b) 
Figure 20 (a) An alternating-sign matrix. (b) A six-vertex 
configuration. 


incident to exactly two edges. One obtains a fully 
packed loop configuration out of a six-vertex config- 
uration by dividing the square lattice into its even and 
odd sublattice denoted by A and B, respectively. 
Instead of arrows, only those edges are drawn that, 
on sublattice A, point inward and, on sublattice B, 
point outward. The reader is referred to de Gier 
(2005) and Di Francesco et al. (2004) for further 
reading. 

The story of alternating-sign matrices and their 
connection to the six-vertex model is given a vivid 
account in Bressoud (1999), with further important 
results by Kuperberg, Okada, Razumov and 
Stroganov, referenced in Razumov and Stroganov 
(2005). 

Fully packed loop configurations seem to play an 
important role in the explicit form of the ground- 
state vectors of certain Hamiltonians in the dense 
O(1) loop model. The corresponding conjectures are 
surveyed in de Gier (2005). There is important 
progress on these conjectures by Di Francesco and 
Zinn—Justin (2005, and references therein). 


Binomial Sums and Hypergeometric Series 


When dealing with enumerative problems, it is 
inevitable to deal with binomial sums, that is, sums 
in which the summands are products/quotients of 
binomial coefficients and factorials, such as, for 


example, 
$11] PE 
ey k n—k 
In most cases, the right environment in which one 


should work is the theory of (generalized) hypergeo- 
metric series. These are defined as follows: 


B e (a). (70), 2" 
,;F ry + = eS, cua n, ee 
Di sus Db). E 


where (a), —o(a + 1)(a + 2)--- (a -- k — 1) for k > 
0, and (o)y— 1. The symbol (o), is called the 


Pochhammer symbol or shifted factorial. For in- 
depth treatments of the subject, we refer the reader 
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to Andrews et al. (1999), Gasper and Rahman 
(2004), and Slater (1966). 

Hypergeometric series can be characterized as 
series in which the quotient of the (k + 1)st by the 
kth summand is a rational function in k. This is also 
the way to convert binomial sums into their 
hypergeometric form (respectively to see if this is 
possible; in most cases it is): form the quotient of the 
(k--1)st by the kth summand and read off the 
parameters 41,...,4,,D4,..., b,, and the argument z 
from the factorization of the numerator and the 
denominator polynomials of the rational function, 
out of these form the corresponding hypergeometric 
series, and multiply the series by the summand for 
k — 0. This is, in fact, a completely routine task, and, 
indeed, computer algebra programs such as Maple 
and Mathematica do this automatically. 

The reason why hypergeometric series are much 
more fundamental than the binomial sums them- 
selves is that there are hundreds of ways to write the 
same sum using binomial coefficients and factorials, 
whereas there is just one hypergeometric form, that 
is, hypergeometric series are a kind of normal form 
for binomial sums. In particular, given a specific 
binomial sum, it is a hopeless enterprise to scan 
through all the identities available in the literature 
for this sum. There may be an identity for it, but 
perhaps written differently. On the contrary, given a 
specific hypergeometric series, the list of available 
identities which apply to this series is usually not 
large, and tables of such identities can be set up in 
a systematic way. This has been done (cf. Slater 
(1966); the most comprehensive table available to 
this date is contained in the manual of 
the Mathematica package HYP - see “Further 
reading”), and scanning through these tables is 
largely facilitated by the use of the Mathematica 
package HYP. 

We give here some of the most important 
identities for hypergeometric series. Aside from the 
binomial theorem, the most important summation 
formulas are: the Gaufs »F;-summation formula 


a,b 
2F, ;1 |= 
C 


l'(c)F'(c — a — b) 
I'(c — a)T (c — b) 


provided (c — a — b) > 0, 
the Pfaff-Saalschütz summation formula 


a, b, =n 
3F2 ;11= 
cl1+a+b—c—n 


(c — a),(c — b), 
(c),(c — a — b), 
provided z is a non-negative integer, and 

the Dougall summation formula 


4,4/2--1,b,c,d,1--2a—b—c— d n,—n 


F 
ix. a[2,1--a—b,1--a— c,1--a— d, 


—a+b+c+d-—na+l+n 


- +a) (i+0-6-—¢), (1--a—-5—d) (1+2a—e—d), 
~ (1+a—b),(14+a-—c),(1+a—d),(1+a—b—c—d), 


provided n is a non-negative integer. 

Some of the most important transformation 
formulas are 
the Euler transformation formula 


a.b c—a,c—b 
2F1 sz | = (4—2) "oF, T. 
€ c 


provided |z| <1, 
the Kummer transformation formula 


a,b,c 
p 4 _ IXe)T(d -e—-a-— b— c) 
uic '" | r(e-a)I(d -e-b- c) 
d.e 
a,d—b,d—c 
x 3P5 sf 
dd+e-b-c 


provided both series converge, 
and the Whipple transformation formulas 


a,b,c, —n 
AF; “J 
ef, l+a+b+c—e-—f-—n 


sE- af 2), 
Ont 


—n,a,1+a+c—e—f—n,1+a+b-—e-f—n 


x 4F3 ;1 
l+a+b+c—e—f—n,1+a—e—n,1+a—f-—n 
[68] 
where n is a non-negative integer, and 
4,1 - 5, b,c,d,e,—n 
7Fe ;1 


2,13 4—b,14T-a—c,1--a—d,1-4—e,1-2-a-Fm 


— (1a), (1-a—d —e), 
— (1a- d),(14-a— e), 


1+a—b6—c,d,e,—n 
X AF; "3 [69| 
l+a—b,l+a—c,—a+d+e—n 


provided n is a non-negative integer. 


Since about 1990, for the verification of binomial 
and hypergeometric series, there are automatic tools 
available. The book by Petkovšek et al. (1996) is an 
excellent introduction into these aspects. The philo- 
sophy is as follows. Suppose we are given a binomial 
or hypergeometric series S(n)— `, F(n,k). The 
Gosper-Zeilberger algorithm (see “Further read- 
ing”) (cf. Petkovsek et al. (1996); a simplified 
version was presented in the reference Zeilberger in 
“Further reading”) will find a linear recurrence 


Ao(n)S(n) + Ai(n)S(n + 1) + 
+ Aq(n)S(n + d) = C(n) 70] 


for some d, where the coefficients Aj(n) are 
polynomials in n, and where C(m) is a certain 
function in n, with proof! 

If, for example, we suspected that S(n) = RHS(z), 
where RHS(z) is some closed-form expression, then 
we just have to verify that RHS(m) satisfies the 
recurrence [70] and check S$S(n)= RHS(») for suffi- 
ciently many initial values of » to have a proof for 
the identity $S(n)= RHS(n) for all n. On the other 
hand, if RHS(z) was a different sum, then we would 
apply the algorithm to find a recurrence for RHS(n). 
If it turns out to be the same recurrence then, again, 
a check of S(n) = RHS(z) for a few initial values will 
provide a full proof of S(z) = RHS(z) for all n. 

Even in the case that we do not have a conjectured 
expression RHS(z), this is not the end of the story. 
Given a recurrence of the type [70], the Petkovšek 
algorithm (see “Further reading”) (cf. Petkovsek et al. 
(1996)) is able to find a closed-form solution (where 
“closed form” has a precise meaning), respectively tell 
that there is no closed-form solution. 

The fascinating point about both algorithms is 
that neither do we have to know what the algorithm 
does internally nor do we have to check that. For 
the Petkovšek algorithm, this is obvious anyway 
because, once the computer says that a certain 
expression is a solution of [70], it is a-routine matter 
to check that. This is less obvious for the Gosper- 
Zeilberger algorithm. However, what the Gosper- 
Zeilberger algorithm does is, for a given sum 
S(n)— 5°, F(n,k), it finds polynomials Aoọ(n), 
A1(2),..., Ag(n) and an expression G(n,k) (which 
is, in fact, a rational multiple of F(z, k)), such that 


Ao(n)F(n, k) + Ai(n)F(n + 1, k) +++: 
+ Ay(n)F(n + d,k) = G(n,k+1)—G(n,k) [71] 


for some d. Because of the properties of F(n, k) and 
G(n,k), which are part of the theory, this is an 
identity which can be directly verified by clearing all 
common factors and checking the remaining identity 
between rational functions in n and k. However, we 
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may now sum both sides of [71] over k to obtain a 
recurrence of the form [70]. 

Algorithms for multiple sums are also available 
(see “Further reading”). They follow ideas by Wilf 
and Zeilberger (1992) (of which a simplified 
version is presented in a Mohammed and Zeilber- 
ger preprint (see “Further reading”)); however, they 
run more quickly in capacity problems. Schneider 
(2005) is currently developing a very promising 
new algorithmic approach to the automatic treat- 
ment of multisums. See q-Special Functions and 
Statistical Mechanics and Combinatorial Problems. 


See also: Classical Groups and Homogeneous Spaces; 
Compact Groups and Their Representations; Dimer 
Problems; Growth Processes in Random Matrix Theory; 
Ordinary Special Functions; g-Special Functions; Saddle 
Point Problems; Statistical Mechanics and Combinatorial 
Problems. 


Further Reading 


http://algo.inria.fr This site includes, among its libraries, the 
Maple program gdev. 

Andrews GE (1976) The Theory of Partitions, Encyclopedia of 
Mathematics and Its Applications, vol. 2. (reprinted by Cambridge 
University Press, Cambridge, 1998). Reading: Addison-Wesley. 

Andrews GE, Askey RA, and Roy R (1999) In: Rota GC (ed.) 
Special Functions, Encyclopedia of Mathematics and Its 
Applications, vol. 71. Cambridge: Cambridge University Press. 

Ayoub R (1963) An Introduction to the Analytic Theory of 
Numbers. Mathematical Surveys, vol. 10, Providence, RI: 
American Mathematical Society. 

Bergeron F, Labelle G, and Leroux P (1998) Combinatorial Species 
and Tree-Like Structures. Cambridge: Cambridge University Press. 

Bousquet-Mélou M and Jehanne A (2005), Polynomial equations 
with one catalytic variable, algebraic series, and map 
enumeration. Preprint, aryiv:math.CO/0504018. 

Bressoud DM (1999) Proofs and Confirmations — The Story of 
the Alternating Sign Matrix Conjecture. Cambridge: Cam- 
bridge University Press. 

de Bruijn NG (1964) Pélya’s theory of counting. In: Beckenbach 
EF (ed.) Applied Combinatorial Mathematics, New York: 
Wiley, (reprinted by Krieger, Malabar, Florida, 1981). 

Comtet L (1974) Advanced Combinatorics. Dordrecht: Reidel. 

Dolbilin NP, Mishchenko AS, Shtan’ko MA, Shtogrin MI, and 
Zinoviev YuM (1996) Homological properties of dimer 
configurations for lattices on surfaces. Functional Analysis 
and its Application 30: 163-173. 

Feller W (1957) An Introduction to Probability Theory and Its 
Applications, vol. 1, 2nd edn. New York: Wiley. 

Fisher ME (1984) Walks, walls, wetting and melting. Journal of 
Statistical Physics 34: 667-729. 

Flajolet P and Sedgewick R, Analytic Combinatorics, book 
project, available at http://algo.inria.fr. 

Di Francesco P, Zinn-Justin P and Zuber J.-B. (2004), Determi- 
nant formulae for some tiling problems and application to 
fully packed loops, Preprint, aryiv:math-ph/0410002. 

Di Francesco P and Zinn-Justin P (2005), Quantum Knizhnik- 
Zamolodchikov equation, generalized Razumov-Stroganov 
sum rules and extended Joseph polynomials. Preprint, 
aryiv:math-ph/0508059. 


576 Compact Groups and Their Representations 


Galluccio A and Loebl M (1999) On the theory of Pfaffian 
orientations I. Perfect matchings and permanents. Electronic 
Journal of Combinatorics 6: Article #R6, 18 pp. 

http://www.fmf.uni-lj.si — website of Faculty of Mathematics of 
University of Ljubljana. A Mathematica implementation by 
Marko Petkovsek is available here. 

Gasper G and Rahman M (2004) Basic Hypergeometric Series, 
2nd edn. Encyclopedia of Mathematics and Its Applications, 
vol. 96. Cambridge: Cambridge University Press. 

de Gier J (2005) Loops matchings and alternating-sign matrices. 
Discrete Matbematics 365—388. 

Humphreys JE (1990) Reflection Groups and Coxeter Groups. 
Cambridge: Cambridge University Press. 

Johansson K (2002) Non-intersecting paths, random tilings and 
random matrices. Probability Theory and Related Fields 
123: 225-280. 

Kenyon R (2003) An Introduction to the Dimer Model, Lecture Notes 
for a Short Course at the ICTP, 2002; aryiv:math.CO/0310326. 

Koekoek R and Swarttouw RF, The Askey-scbeme of bypergeo- 
metric orthogonal polynomials and its q-analogue, TU Delft, 
The Netherlands, 1998; on the www: http://aw.twi.tudelft.nl. 

Krattenthaler C (1997) The enumeration of lattice paths with 
respect to their number of turns. In: Balakrishnan N (ed.) 
Advances in Combinatorial Methods and Applications to 
Probability and Statistics, pp. 29-58. Boston: Birkhauser. 

Krattenthaler C (2003), Asymptotics for random walks in alcoves 
of affine Weyl groups. Preprint, aryiv:math.CO/0301203. 

Krattenthaler C (2005a), Watermelon configurations with wall 
interaction: exact and asymptotic results. Preprint, 
aryiv:math.CO/0506323. 

Krattenthaler C (2005b) Advanced determinant calculus: a 
complement. Linear Algebra Applications 411: 68-166. 

Krattenthaler C, Guttmann AJ, and Viennot XG (2000) Vicious 
walkers, friendly walkers and Young tableaux II: with a wall. 
Journal of Physics A: Mathematical and General 33: 8835-8866. 

Kuperberg G (1998) An exploration of the permanent-determi- 
nant method. Electronic Journal of Combinatorics 5: Article 
#R46, 34 pp. 

Labelle G and Lamathe C (2004) A shifted asymmetry index 
series. Advances in Applied Mathematics 32: 576-608. 

Mohammed M and Zeilberger D (2005) Multi-variable Zeilberger 
and Almkvist-Zeilberger algorithms and the sharpening of 
Wilf-Zeilberger theory. Advanced Applications in Mathe- 
matics (to appear). 

Mohanty SG (1979) Lattice Path Counting and Applications. 
New York: Academic Press. 

Odlyzko AM (1995) Asymptotic enumeration methods. In: 
Graham RL, Grótschel M, and Lovász L (eds.) Handbook of 
Combinatorics, pp. 1063-1229. Amsterdam: Elsevier. 


_ A Kirillov, University of Pennsylvania, 
_ Philadelphia, PA, USA 

"A Kirillov, Jr., Stony Brook University, 
_ Stony Brook, NY, USA 


| © 2006 Elsevier Ltd. All rights reserved. 


In this article, we describe the structure and 
representation theory of compact Lie groups. 
Throughout the article, G is a compact real Lie 


Pemantle R and Wilson MC, Twenty combinatorial examples of 
asymptotics derived from multivariate generating functions. 
Preprint, available at http://www.cs.auckland.ac.nz. 

Petkovsek M, Wilf H, and Zeilberger D (1996) A = B Wellesley: 
Peters AK. 

http://www.mat.univie.ac.at - Website of Faculty of Mathematics, 
University of Vienna. It provides the manual of the Mathe- 
matica package HYP. 

Propp J (1999) Enumeration of matchings: problems and progress. 
In: Billera L, Bjórner A, Greene C, Simion R, and Stanley RP 
(eds.) New Perspectives in Algebraic Combinatorics, Mathe- 
matical Sciences Research Institute Publications, vol. 38, 
pp. 255-291. Cambridge: Cambridge University Press. 

Razumov AV and Stroganov YG (2005) Enumeration of quarter- 
turn symmetric alternating-sign matrices of odd order. 
Preprint, aryiv:math-ph/0507003. 

Robertson N, Seymour PD, and Thomas R (1999) Permanents, 
Pfaffian orientations,-and even directed circuits. Annals of 
Matbematics 150(2): 929-975. 

Schneider C (2005) A new Sigma approach to multi-summation. 
Advances in Applied Matbematics 34(4): 740-767. 

Slater LJ (1966) Generalized | Hypergeometric 
Cambridge: Cambridge University Press. 

Stanley RP (1986) Enumerative Combinatorics, Pacific Grove, 
CA: Wadsworth & Brooks/Cole, (reprinted by Cambridge 
University Press, Cambridge, 1998). 

Stanley RP (1999) Enumerative Combinatorics, vol. 2. Cambridge: 
Cambridge University Press. 

Szego G (1959) Orthogonal Polynomials, American Mathematical 
Society Colloquium Publications, vol. 23. New York. Provi- 
dence RI: American Mathematical Society. 

Tesler G (2000) Matchings in graphs on non-oriented surfaces. 
Journal of Combinatorial Tbeory Series B 78: 198-231. 

http://www.risc.uni.linz.ac.at — website of RISC (Research Insti- 
tute for Symbolic Computation). Mathematica implementa- 
tions written by Peter Paule and Markus Schorn, and Axel 
Riese and Kurt Wegschaider are available here. 

http://www.math.rutgers.edu — website of Department of Mathe- 
matics, Rutgers University. Computer implementations written 
by D Zeilberger are available here. 

Viennot X and James W Heaps of segments, q-Bessel functions in 
square lattice enumeration and applications in quantum 
gravity. Preprint. 

Wilf HS and Zeilberger D (1992) An algorithmic proof theory for 
hypergeometric (ordinary and *4") multisum/integral identi- 
ties. Inventiones Mathematicae 108: 575-633. 

Zeilberger D (2005) Deconstructing the Zeilberger algorithm. 
Journal of Difference Equations and Applications 11: 851-856. 


Functions. 


| Compact Groups and Their Representations 


group with Lie algebra q. Unless otherwise stated, 
G is assumed to be connected. The word “group” 
will always mean a “Lie group" and the word 
"subgroup" will mean a closed Lie subgroup. The 
notation Lie(H) stands for the Lie algebra of a Lie 
group H. We assume that the reader is familiar 
with the basic facts of the theory of Lie groups and 
Lie algebras, which can be found in Lie Groups: 
General Theory, or in the books listed in the 
bibliography. 


Examples of Compact Lie Groups 
Examples of compact groups include 


e finite groups, 

e quotient groups T" — R"/Z", or more generally, 
V/L, where V is a finite-dimensional real vector 
space and L is a lattice in V, that is, a discrete 
subgroup generated by some basis in V — groups 
of this type are called “tori”; it is known that 
every commutative connected compact group is a 
torus; 

è unitary groups U(») and special unitary groups 
SU(n), 1 > 2; 

è orthogonal groups O(n) and SO(n),n > 3; and 

e the groups U(z, H),n > 1, of unitary quaternionic 
transformations, which are isomorphic to Sp(n) := 
Sp(z, C) A SU(2z). 


The groups O(n) have two connected components, 
one of which is SO(n). The groups SU(z) and Sp(z) 
are connected and simply connected. 

The groups SO(n) are connected but not simply 
connected: for n >3, the fundamental group of 
SO(n) is Zo. The universal cover of SO(n) is a 
simply connected compact Lie group denoted by 
Spin(z). For small n, we have isomorphisms: 
Spin(3) ~ SU(2), Spin(4) ~ SU(2) x SU(2), Spin(5) ~ 
Sp(4), and Spin(6) ~ SU(4). 


Relation to Semisimple Lie Algebras 
and Lie Groups 


Reductive Groups 
A Lie algebra q is called 


e "simple" if it is nonabelian and has no ideals 
different from {0} and q itself; 

e "semisimple" if it is a direct sum of simple ideals; 
and 

€ “reductive” if it is a direct sum of semisimple and 
commutative ideals. 


We call a connected Lie group G “simple” or 
“semisimple” if Lie(G) has this property. 


Theorem 1 Let G be a connected compact Lie 
group and à — Lie(G). Then 


(i) The Lie algebra aq = Lie(G) is reductive: q— a & 
q, where a is abelian and à'—[g,a] is 
semisimple. 

(ii) The group G can be written in the form G — (A x 
K)/ Z, wbere A is a torus, K is a connected, simply 
connected compact semisimple Lie group, and Z 
is a finite central subgroup in A x K. 

(ii) If G is simply connected, it is a product of 
simple compact Lie groups. 
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The proof of these results is based on the fact that 
the Killing form of q is negative semidefinite. 


Example 1 The group U(m) contains as the center 
the subgroup C of scalar matrices. The quotient 
group U(z)/C is simple and isomorphic to 
SU(1n)/Z,. The presentation of Theorem 1 in this 
case is 


U(n) = (T! x SU(n))/Z, 
= (C x SU(n))/(C n SU(n)) 


For the group SO(4) the 
(SU(2) x SU(2))/{4(1 x 1)}. 


presentation is 


This theorem effectively reduces the study of the 
structure of connected compact groups to the study 
of simply connected compact simple Lie groups. 


Complexification of a Compact Lie Group 


Recall that for a real Lie algebra q, its complex- 
ification is qc — q ® C with obvious commutator. It 
is also well known that Qc is semisimple or 
reductive iff q is semisimple or reductive, respec- 
tively. There is a subtlety in the case of simple 
algebras: it is possible that a real Lie algebra is 
simple, but its complexification Qc is only semi- 
simple. However, this problem never arises for Lie 
algebras of compact groups: if q is a Lie algebra of a 
real compact Lie group, then q is simple if and only if 
qc is simple. 

The notion of complexification for Lie groups is 
more delicate. 


Definition 1 Let G be a connected real Lie group 
with Lie algebra q. A complexification of G is a 
connected complex Lie group Ge (i.e., a complex 
manifold with a structure of a Lie group such that 
group multiplication is given by a complex analytic 
map Gc x Gc — Ge), which contains G as a closed 
subgroup, and such that Lie(Gc) — ac. In this case, 
we will also say that G is a real form of Ge. 


It is not obvious why such a complexification 
exists at all; in fact, for arbitrary real group it may 
not exist. However, for compact groups we do have 
the following theorem. 


Theorem 2 Let G be a connected compact Lie 
group. Then it bas a unique complexification Gc D G. 
Moreover, tbe following properties bold: 


(i) The inclusion G C Ge is a homotopy equiva- 
lence. In particular, ™(G)=71(Gc) and the 
quotient space Gc/G is contractible. 

(ii) Every complex finite-dimensional representation 
of G can be uniquely extended to a complex 
analytic representation of Gc. 
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Since the Lie algebra of a compact Lie group G is 
reductive, we see that Gc must be reductive; if G is 
semisimple or simple, then so is Gc. The natural 
question is whether every complex reductive group 
can be obtained in this way. The following theorem 
gives a partial answer. 


Theorem 3 Every connected complex semisimple 
Lie group H bas a compact real form: tbere is a 
compact real subgroup K CH such that H — Kc. 
Moreover, such a compact real form is unique up to 
conjugation. 


Example 2 


(1) The unitary group U(z) is a compact real form 
of the group GL(z, C). 

(ii) The orthogonal group SO(z) is a compact real 
form of the group SO(n, C). 

(iii) The group Sp(z) is a compact real form of the 
group Spí(z, C). 

(iv) The universal cover of GL(z, C) has no compact 
real form. 


These results have a number of important appli- 
cations. For example, they show that study of 
representations of a semisimple complex group H 
can be replaced by the study of representations of its 
compact form; in particular, every representation is 
completely reducible (this argument is known as 
Weyl’s unitary trick). 


Classification of Simple Compact Lie Groups 


Theorem 1 essentially reduces such classification to 
classification of simply connected simple compact 
groups, and Theorems 2 and 3 reduce it to the 
classification of simple complex Lie algebras. Since 
the latter is well known, we get the following result. 


Theorem 4 Let G be a connected, simply con- 
nected simple compact Lie group. Then qc must be 
a simple complex Lie algebra and thus can be 
described by a Dynkin diagram of one the following 
types: Ans Ba, Gas Dns Eg, En Eg, F4, G3. 

Conversely, for each Dynkin diagram in the above 
list, there exists a unique, up to isomorphism, simply 
connected simple compact Lie group whose Lie 
algebra is described by this Dynkin diagram. 


For types A,,...,D,, the corresponding compact 
Lie groups are well-known classical groups shown in 
the table below: 


An, n> 1 


SU(n + 1) 


The restrictions on m in this table are 
made to avoid repetitions which appear for 
small values of n. Namely, A; — B1— Ci, which 
gives SU(2) 2 Spin(3) 2 Sp(1; D2 =A; UA, which 
gives Spin(4) 2 SU(2) x SU(2); B; = Cz, which gives 
SO(5)—Sp(4)5 and A;3— Ds, which gives SU(4) = 
Spin(6). Other than that, all entries are distinct. 

Exceptional groups Eg,..., G2 also admit explicit 
geometric and algebraic descriptions which are 
related to the exceptional nonassociative algebra O 
of the so-called octonions (or Cayley numbers). For 
example, the compact group of type G can be 
defined as a subgroup of SO(7) which preserves an 
almost-complex structure on $5. It can also be 
described as the subgroup of GL(7,R) which 
preserves one quadratic and one cubic form, or, 
finally, as a group of all automorphisms of O. 


Maximal Tori 
Main Properties 
In this section, G is a compact connected Lie group. 


Definition 2 A “maximal torus" in G is a maximal 
connected commutative subgroup T C G. 


The following theorem lists the main properties of 
maximal tori. 


Theorem 5 


(i) For every element g € G, there exists a maximal 
torus T 3 g. 

(ii) Any two maximal tori in G are conjugate. 

(ui) If ge G commutes with all elements of a 
maximal torus T, then g € T. 

(iv) A connected subgroup H C G is a maximal 
torus iff the Lie algebra Lie(H) is a maximal 
abelian subalgebra in Lie(G). 


Example 3 Let G-U(z) Then the set T of 
diagonal unitary matrices is a maximal torus in G; 
moreover, every maximal torus is of this form after 
a suitable unitary change of basis. In particular, this 
implies that every element in G is conjugate to a 
diagonal matrix. 


Example 4 Let G=SO(3). Then the set D of 
diagonal matrices is a maximal commutative sub- 
group in G, but not a torus. Here D consists of four 
elements and is not connected. 


Maximal Tori and Cartan Subalgebras 


The study of maximal tori in compact Lie groups is 
closely related to the study of Cartan subalgebras in 
reductive complex Lie algebras. For convenience of 
readers, we briefly recall the appropriate definitions 


here; details can be found in Serre (2001) or in Lie 
Groups: General Theory. 


Definition 3 Let a be a complex reductive Lie 
algebra. A Cartan subalgebra b C a is a maximal 
commutative subalgebra consisting of semisimple 
elements. 


Note that for general Lie algebras Cartan sub- 
algebra is defined in a different way; however, for 
reductive algebras the definition given above is 
equivalent to the standard one. 

A choice of a Cartan subalgebra gives rise to the 
so-called root decomposition: if 5 C a is a Cartan 
subalgebra in a complex reductive Lie algebra, then 
we can write 


acR 


a=b) (e J [1] 
where 


a, = [x e a| ad b.x = (a, b)x Vb € b) 
R= (a €  — (0a, #0} c 


The set R is called the “root system" of a with 
respect to Cartan subalgebra 5; elements a € R are 
called *roots." We will also frequently use elements 
ay € h defined by (aY, 3) =2(a, B)/(a, o) where (,) 
is a nondegenerate invariant bilinear form on a* and 
(,) is the pairing between a and a*. It can be shown 
that so defined o" does not depend on the choice of 
the form (,). 


Theorem 6 Let G be a connected compact Lie 
group with Lie algebra q, and let TCG be a 
maximal torus in G, t= Lie(T) C q. Let ac, Gc be 
the complexification of a, G as in Theorem 2. 

Let h=tc C ac. Then 5 is a Cartan subalgebra in 
qc, and the corresponding root system RC it. 
Conversely, every Cartan subalgebra in Qç can be 
obtained as tc for some maximal torus T C G. 


Weights and Roots 


Let G be semisimple. Recall that the root lattice 
OQ C it? is the abelian group generated by roots a € 
R, and let the coroot lattice OY C it be the abelian 
group generated by coroots o",o € R. Define also 
the weight and coweight lattices by 


P= {A\\(a’,A) EZ Va € R} cit 
PY = {t\(ta)E€Z Va € R} C it, 


where (-,-) is the pairing between t and the dual 
vector space t“. 
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It follows from the definition of root system that 
we have inclusions 


Dec 2] 
oe P c 

Both P, O are lattices in it*; thus, the index (P : Q) 
is finite. It can be computed explicitly: if a; is a basis 


of the root system, then the fundamental weights w; 
defined by 


(oj ,wj) = 6j 


form a basis of P. The simple roots o; are related 
to fundamental weights wj by the Cartan matrix A: 
a; = >> Agu Therefore, (P: Q) - (PV : QV) =| det A]. 

Definitions of P, O, P", O" also make sense when 
q is reductive but not semisimple. However, in this 
case they are no longer lattices: rkO < dim t*, and P 
is not discrete. 

We can now give more precise information about 
the structure of the maximal torus. 


Lemma 1 Let T be a compact connected commu- 
tative Lie group, and t — Lie(T) its Lie algebra. Then 
the exponential map is surjective and preimage 
of unit is a lattice L C t. There is an isomorphism 
of Lie groups 


exp: t/L —^ T 


In particular, T ~ R'/Z' =T ,r= dim T. 
Let X(T) c it* be the lattice dual to (27i) !L: 


X(T) = (A € it'|(A,]) € 2miZ Vl e L} [3] 


It is called the “character lattice" for T (see the 
subsection *Examples of representations"). 


Theorem 7 Let G be a compact connected Lie 
group, and let T C G be a maximal torus in G. 

Then Q c X(T) C P. Moreover, the group G is 
uniquely determined by the Lie algebra q and the 
lattice X(T) € it* which can be any lattice between 
O and P. 


Corollary For a given complex semisimple Lie 
algebra a, there are only finitely many (up to 
isomorphism) compact connected Lie groups G 
with qc =a. 

The largest of them is the simply connected group, 
for which T =t/2miQ’, X(T) =P; the smallest is the 
so-called “adjoint group," for which T =t/27iP”, 
XT. 


Example 5 Let G —U(z). Then it — (real diagonal 
matrices]. Choosing the standard basis of matrix 
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units in it, we identify it ~ R”, which also allows us 
to identify it* ~ R”. Under this identification, 


O= { M) € Z, x = OF 
P= {(A1,---,An)|Ai € R, A; — Aj € Z} 


ACT) = A 
Note that Q,P are not lattices: Or Z""', 


PYRx ZZ, 


Now let G=SU(n). Then it* 2 R"/R - (1,...,1), 
and O, P are the images of O, P for G = U(z) in this 
quotient. In this quotient they are lattices, and 
(P:Q)-—n. The character lattice in this case is 
X(T)=P, since SU(m) is simply connected. The 
adjoint group is PSU(mz)=SU(m)/C, where C= 
(A -id|A” = 1} is the center of SU(m). 


Weyl Group 


Let us fix a maximal torus T C G. Let N(T) C G be 
the normalizer of T in G: N(T) - (g € G|gTg !— T). 
For any g € N(T) the transformation A(g): t — gtg is 
an automorphism of T. According to Theorem 5, this 
automorphism is trivial iff g € T. So in fact, it is the 
quotient group N(T)/T which acts on T. 


Definition 4 The group W — N(T)/T is called the 
“Weyl group" of G. 


Since the Weyl group acts faithfully on t and t', it 
is common to consider W as a subgroup in GL(t*). It 
is known that W is finite. 

The Weyl group can also be defined in terms of 
Lie algebra q and its complexification Qc. 


Theorem 8 The Weyl group coincides witb tbe 
subgroup in GL(it*) generated by reflections 
s,:x x — (2(a, x))/(a, a), a € R, where, as 
before, (,) is a nondegenerate invariant bilinear 
form on q*. 


Theorem 9 


(i) Two elements t,,t2 € T are conjugate in G iff 
tı —w(ti) for some w € W. 

(ii) There exists a natural homeomorphism of 
quotient spaces G/AdG ~ T/W, where AdG 
stands for action of G on itself by conjugation. 
(Note, however, that these quotient spaces are 
not manifolds: they have singularities.) 

(iii) Let us call a function f on G central if 
f(bgh)=f(g) for any g,b€ G. Then the 
restriction map gives an isomorphism 


{ continuous central functions on G} 


c (W — invariant continuous functions on T} 


Example 6 Let G — U(z). The set of diagonal unitary 
matrices is a maximal torus, and the Weyl group is the 
symmetric group $, acting on diagonal matrices by 
permutations of entries. In this case, Theorem 9 shows 
that if f(U) is a central function of a unitary matrix, 
then f(U) —f(M,..., An), where A; are eigenvalues of 
U and f is a symmetric function in n variables. 


Representations of Compact Groups 
Basic Notions 


By a representation of G we understand a pair 
(1, V), where V is a complex vector space and 7 is 
a continuous homomorphism G — Aut(V). This 
notation is often shortened to 7 or V. In this article, 
we only consider finite-dimensional (f.d.) represen- 
tations; in this case, the homomorphism 7 is 
automatically smooth and even real-analytic. 

We associate to any f.d. representation (7, V) of G 
the representation (m+, V) of the Lie algebra q = Lie(G) 
which is just the derivative of the map 7: G — AutV at 
the unit point e € G. In terms of the exponential map, 
we have the following commutative diagram: 


G — AutV 


exp T T exp 


6 —> EndV 


Choosing a basis in V, we can write the operators 
m(g) and 7,(X) in matrix form and consider 7 and 7, 
as matrix-valued functions on G and q. The diagram 
above means that 


"(exp X) = e" 9 [4] 


Recall that if G is connected, simply connected, then 
every representation of q can be uniquely lifted to a 
representation of G. Thus, classification of repre- 
sentations of connected simply connected Lie groups 
is equivalent to the classification of representations 
of Lie algebras. 

Let (m1, V1) and (m2, V2) be two representations of 
the same group G. An operator A € Hom( V1, V2) is 
called an “intertwining operator,” or simply an 
"intertwiner," if A o mı(g)=m:(g)o A for all ge G. 
Two representations are called *equivalent" if they 
admit an invertible intertwiner. In this case, using an 
appropriate choice of bases, we can write 7, and 7 
by the same matrix-valued function. 

Let (7, V) be a representation of G. If all operators 
"(g),g € G, preserve a subspace V; C V, then the 
restrictions 71(g) —7(g)|y, define a “subrepresenta- 
tion" (74, V1) of (m, V). In this case, the quotient 
space V; = V/V also has a canonical structure of a 
representation, called the “quotient representation." 


A representation (m, V) is called “reducible” if it 
has a nontrivial (different from V and {0}) sub- 
representation. Otherwise it is called “irreducible.” 

We call representation (7, V) “unitary” if V is a 
Hilbert space and all operators z(g),g € G, are 
unitary, that is, given by unitary matrices in any 
orthonormal basis. We use a short term “unirrep” 
for a *unitary irreducible representation." 


Main Theorems 


The following simple but important result was one 
of the first discoveries in representation theory. It 
holds for representations of any group, not necessa- 
rily compact. 


Theorem 10 (Schur lemma). Let (mi, Vj), i — 1,2, be 
any two irreducible finite-dimensional representa- 
tions of the same group G. Then any intertwiner 
A: V4 — Va is either invertible or zero. 


Corollary 1 If V is an irreducible f.d. representation, 
then any intertwiner A: V — V is scalar: A=c-id,c € C. 


Corollary 2 Every irreducible representation of a 
commutative group is one dimensional. 


The following theorem is one of the fundamental 
results of the representation theory of compact 
groups. Its proof is based on the technique of 
invariant integrals on a compact group, which will 
be discussed in the next section. 


Theorem 11 


(i) Any f.d. representation of a compact group is 
equivalent to a unitary representation. 

(ii) Any f.d. representation is completely reducible: 
it can be decomposed into direct sum 


y = BD nV; 


where V; are pairwise nonequivalent unirreps. 
Numbers n; € Z, are called “multiplicities.” 


Examples of Representations 


The representation theory looks rather different for 
abelian (i.e. commutative) and nonabelian groups. 
Here we consider two simplest examples of both kinds. 

Our first example is a one-dimensional compact 
connected Lie group. Topologically, it is a circle 
which we realize as a set T ^ U(1) of all complex 
numbers £ with absolute value 1. 

Every unirrep of T is one dimensional; thus, it is 
just a continuous multiplicative map 7 of T to itself. 
It is well known that every such map has the form 


k 


m,(t)=t* for some k € Z 
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The collection of all unirreps of T is itself a group, 
called “Pontrjagin dual” of T and denoted by 
T. This group is isomorphic to Z. 

By Theorem 11, any f.d. representation m of T is 
equivalent to a direct sum of one-dimensional 
unirreps. So, an equivalence class of 7 is defined by 
the multiplicity function u on T —Z taking non- 
negative values: 


vc Hk) m 


The many-dimensional case of compact connected 
abelian Lie group can be treated in a similar way. 
Let T be a torus, that is, an abelian compact group, 
t=Lie(T). Then every irreducible representation 
of T is one dimensional and thus is defined by a 
group homomorphism x:T— T'—U(1). Such 
homomorphisms are called “characters” of T. One 
easily sees that such characters themselves form a 
group (Pontrjagin dual of T). If we denote by L the 
kernel of the exponential map t — T (see Lemma 1), 
one easily sees that every character has a form 


x(exp(t)) =e’, tet, Ae X(T) 


where X(T) C it* is the lattice defined by [3]. Thus, 
we can identify the group of characters T with X(T). 
In particular, this shows that T ~ Z7, 

The second example is the group G — SU(2), the 
simplest connected, simply connected nonabelian 
compact Lie group. Topologically, G is a three- 
dimensional sphere since the general element of G is 
a matrix of the form 


-(5 
-Ag 


Let V be two-dimensional complex vector space, 
realized by column vectors (^). The group G acts 
naturally on V. This action induces the representa- 
tion II of G in the space S(V) of all polynomials in 
4, v. It is infinite dimensional, but has many f.d. 
subrepresentations. In particular, let S*(V), or 
simply S*, be the space of all homogeneous 
polynomials of degree k. Clearly, dim S* =k + 1. 

It turns out that the corresponding f.d. representa- 
tions (Ip, $^), & > 0, are irreducible, pairwise non- 
equivalent, and exhaust the set G of all unirreps. 

Some particular cases are of special interest: 


b 
时 a,b € C, |a? +b? =1 
a 


1. k=0. The space Vo consists of constant functions 
and IIo is the trivial one-dimensional representa- 
tion: IIo(g) = 1. 

2. k— 1. The space Vj is identical to V and II, is 
just the tautological representation z(g) = g. 

3. k — 2. The space V2 is spanned by monomials 
u^, uv, v^. The remarkable fact is that this 
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representation is equivalent to a real one. Namely, 
in the new basis 


u? 4-17 «^ — v? 
Es y 
j Re(a? +b?) 2Im(ab) Im(b? — a?) 
a a - 
II; & s =| 2Im(ab) |a\*—|b|*?  2Re(ab) 
Im(a? +b?) 2Re(ab) Re(a* — b?) 


This formula defines a homomorphism II; :9U(2) 一 
SO(3). It can be shown that this homomorphism is 
surjective, and its kernel is the subgroup 
{+1} c SU(2): 


1 — {+1} 4 SUQ) —+SO(3) = 1 


The simplest way to see it is to establish the 
equivalence of II; with the adjoint representation 
of G in q. The corresponding intertwiner is 


S? 5 (a+iy)u? + 2iBuv 
E 18 atiy 
4 (a e ( ip Jea 

Note that SU(2) and SO(3) are the only compact 
groups associated with the Lie algebra st (2, C). 

The group G contains the subgroup H of diagonal 
matrices, isomorphic to T'. Consider the restriction 
of IL, to T’. It splits into the sum of unirreps 7; as 
follows: 


s—|[n/2] 


G 3 
Resy: IL. 一 Tn—2s 
s=0 


The characters mą which enter this decomposition 
are called the weights of IL,. The collection of all 
weights (together with multiplicities) forms a multi- 
set in T denoted by P(II,) or P(S”). 

Note the following features of this multiset: 


1. P(IL,) is invariant under reflection k++ —k. 

2. All weights of II, are congruent modulo 2. 

3. The nonequivalent unirreps have different multi- 
sets of weights. 


Below we show how these features are generalizéd 
to all compact connected Lie groups. 


Fourier Transform 
Haar Measure and Invariant Integral 


The important feature of compact groups is the 
existence of the so-called “invariant integral," or 
"average." 


Theorem 12 For every compact Lie group G, there 
exists a unique measure dg on G, called “Haar 
measure," which is invariant under left shifts 
L,:b gh and satisfies fp dg — 1. 

In addition, this measure is also invariant under 
right shifts h> hg and under involution h= b^. 


Invariance of the Haar measure implies that for 
every integrable function f(g), we have 


[ ferae= | fme | fiehjdg= f fas 


For a finite group G, the integral with respect to 
the Haar measure is just averaging over the group: 


1 
| fede = ici af) 


geG 


For compact connected Lie groups, the Haar 
measure is given by a differential form of top degree 
which is invariant under right and left translations. 

For a torus T" = R"/Z" with real coordinates 6, € 
R/Z or complex coordinates t; — e^"? the Haar 
measure is d" 0 :— d64d6; --- d0, or 


In particular, consider a central function f (see 
Theorem 9). Since every conjugacy class contains 
elements of the maximal torus T (see Theorem 5), 
such a function is determined by its values on T, and 
the integral of a central function can be reduced to 
integration over T. The resulting formula is called 
“Weyl integration formula." For G —U(z) it looks 
as follows: 


oe uuum 
| f(g)dg = AED tj d t 


i<j 


where T is the maximal torus consisting of diagonal 
matrices 


t = diag(ti,...,t,), — tQ =e? 


and d"t is defined above. 

Weyl integration formula for arbitrary compact 
group G can be found in Simon (1996) or Bump 
(2004, section 18). 

The main applications of the Haar measure are the 
proof of complete reducibility theorem (Theorem 11) 
and orthogonality relations (see below). 


Orthogonality Relations and Peter-Weyl Theorem 


Let V;,V2 be unirreps of a compact group G. 
Taking any linear operator A: V; — V2 and aver- 
aging the expression A(g):—75(g !) o Aomi(g) over 


PW 


G, we get an intertwining operator (A) = fẹ A(g)dg. 
Comparing this fact with the Schur lemma, one 
obtains the following fundamental results. 

Let (7, V) be any unirrep of a compact group G. 
Choose any orthonormal basis [v;,1 < k < dim V] 
in V and denote by t}, or #7), the function on G 
defined by 


tulg) = (n(g)vi. ve) 


The functions tt are called “matrix elements” of the 
unirrep (7, V). 


Theorem 13 (Orthogonality relations) 


(i) The matrix elements tj, are pairwise orthogonal 
and have norm (dim V) in L?(G, dg). 

(ii) The matrix elements corresponding to equiva- 
lent unirreps span tbe same subspace in 
L?^(G, dg). 

(iii) The matrix elements of two nonequivalent 
unirreps are orthogonal. 

(iv) The linear span of all matrix elements of all 
unirreps is dense in C(G),C*(G), and in 
L^(G, dg) (generalized Peter-Weyl theorem). 


In particular, this theorem implies that the set G of 
equivalence classes of unirreps is countable. For an 
f.d. representation (7, V) we introduce the character 
of 7 as a function 


dim V 


p tg [5] 


It is obviously a central function on G. 


xa«(g) = tra(g 


Remark Traditionally, in representation theory 
the word “character” has two different meanings: 
(1) a multiplicative map from a group to U(1), and 
(2) the trace of a representation operator z(g). For 
one-dimensional representations both notions 
coincide. 


From the orthogonality relations we get the 
following result. 


Corollary The characters of unirreps of G form an 
orthonormal basis in the subspace of central func- 
tions in L*(G, dg). 


Noncommutative Fourier Transform 


The noncommutative Fourier transform on a com- 
pact group G is defined as follows. Let G denote the 
set of equivalence classes of unirreps of G. Choose 
for any A € G a representation (7), V3) of class A 
and an orthonormal basis in V4. Denote by d(A) the 
dimension of Vj. 
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We introduce the Hilbert space L?(G) as the space 
of matrix-valued functions on G whose value at a point 
À € G belongs to Mat,,,)(C). The norm is defined as 


IF, c. = $40) )F(A)*) 
NEG 


For a function f on G define its Fourier transform f 
as a matrix-valued function on G: 


= f f(g ')m(g)dg 
G 


Note that in the case G-— T! this transform 
associates to a function f the set of its Fourier 
coefficients. In general this transform keeps some 
important features of Fourier coefficients. 


Theorem 14 


(i) Fora function f € L'(G, dg) the Fourier transform 
f is well defined and bounded (by matrix norm) 
function on G. 

(ii) For a function f € L'(G,dg) n L^(G,dg) the 
following analog of tbe Plancherel formula holds: 


Wea = [ f(g) dg 
= $740) EFAA = Mf If. 
AEG 


(iii) The following inversion formula expresses f in 
terms of f: 


= 》 d(A) -tr(f (A)ma(8)) 
MEG 
(iv) The Fourier transform sends the convolution to 
the matrix multiplication: 


hf B =f - h 


where the convolution product x 


(fi * f3)(b 


is defined by 
)= | Abele) de 


Note the special case of the inversion formula for 


g-—e 
e) 3.40) tr 
AEG 


or 


= 2 40) xa(g) 


AEG 


where 6(g) is Diras delta-function: [(, f(g) 
ó(g) dg =f (e). Thus, we get a presentation of Dirac’s 
delta-function as a linear combination of characters. 
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Classification of Finite-Dimensional 
Representations 


In this section, we give a classification of unirreps of 
a connected compact Lie group G. 


Weight Decomposition 


Let G be a connected compact group with maximal 
torus T, and let (7, V) be a f.d. representation of G. 
Restricting it to T and using complete reducibility, 
we get the following result. 


Theorem 15 Tbe vector space V can be written in 
the form 


v= Qv 
AeX(T) [3 


Vy, = (v € Viz.(t)v =(A, tjv Vt € t) 


where X(T) is the character group of T defined by [3]. 
The spaces Vy are called “weight subspaces,” 
vectors v € V, — “weight vectors” of weight A. The set 


P(V) = (A € X(T) Vx F {OF} [7] 


is called the “set of weights" of m, or the “spectrum” 
of Res¢-x, and 


mult;,,y) (A) :— dim WA 


is called the *multiplicity" of A in V. 


The next theorem easily follows from the defini- 
tion of the Weyl group. 


Theorem 16 For any f.d. representation V of G, 
the set of weights with multiplicities is invariant 
under the action of the Weyl group: 


mult v) (à) = mult; y)(w(A)) 


for any w € W. 


Classification of Unirreps 


Recall that R is the root system of gc. Assume that 
we have chosen a basis of simple roots o1,...,0, C 
R. Then R=R, U R; roots a € R, can be written 
as a linear combination of simple roots with positive 
coefficients, and R= —R,. 

A (not necessarily f.d.) representation of Qe is 
called a “highest-weight representation" if it is 
generated by a single vector v € V, (the highest- 
weight vector) such that q,v—0O for all positive 
roots a € R,. 

It can be shown that for every \ € X(T), there is a 
unique irreducible highest-weight representation of 
qc with highest weight A, which is denoted L(A). 


However, this representation can be infinite dimen- 
sional; moreover, it may not be possible to lift it to a 
representation of G. 


Definition § A weight A € X(T) is called *domi- 
nant” if (4,07) € Z4 for any simple root a;. The set 
of all dominant weights is denoted by X,(T). 


Theorem 17 


(i) All weights of L(A) are of the form =X — Xnja;, 
n; € Li. 

(ii) Let à € X,. Then the irreducible highest-weight 
representation L(A) is f.d. and lifts to a 
representation of G. 

(iii) Every irreducible f.d. representation of G is of 
the form L(A) for some ^ € X,. 


Thus, we have a bijection {unirreps of G} 一 X,. 


Example 7 Let G — SU(2). There is a unique simple 
root o and the unique fundamental weight w, related 
by a=2w. Therefore, X, = Z, -w and unirreps are 
indexed by non-negative integers. The representa- 
tion with highest weight k-w is precisely the 
representation IJ, constructed in the subsection 
“Examples of representations.” 


Example 8 Let G=U(n). Then X= Z”, and X, = 
((A3,...,24) E Z” | >- > An}. Such objects are 
well known in combinatorics: if we additionally 
assume that A,, > 0, then such dominant weights are 
in bijection with partitions with z parts. They can 
also be described by “Young diagrams" with n rows 


(see Fulton and Harris (1991)). 


Explicit Construction of Representations 


In addition to description of unirreps as highest- 
weight representations, they can also be constructed 
in other ways. In particular, they can be defined 
analytically as follows. Let B=HN, be the 
Borel subgroup in Gc; here H=exph, 
Ni =exp cn, (8ce For AEH", let 5:B5C" 
be a multiplicative map defined by 


xa(bn) = e^» [8] 


Theorem 18 (Cartan-Borel-Weil) Let A € X(T). 
Denote by V(A) the space of complex-analytic 
functions on Gc which satisfy the following trans- 
formation property: 


f(gb) = xy (Df(g, g€Gc, beB 
The group Gc acts on V(A) by left shifts: 


(n(g)f)(b) = f(g) [9] 


Then 


(i) V(A) z (0) iff -À € X4. 


(ii) If -\ € X+, the representation of G in V(X) is 
equivalent to L(wo(A)), where wo € W is the 
unique element of the Weyl group which sends 
RK to R. 


This theorem can also be reformulated in more 
geometric terms: the spaces V(A) are naturally 
interpreted as spaces of global sections of appro- 
priate line bundles on the “flag variety” 
B=Gc/B=G/T. 

For classical groups, irreducible representations 
can also be constructed explicitly as the subspaces in 
tensor powers (C")*^, transforming in a certain way 
under the action of the symmetric group $,. 


Characters and Multiplicities 
Characters 


Let (7, V) be a f.d. representation of G and let y, be 
its character as defined by [5]. Since x, is central, 
and every element in G is conjugate to an element of 
T, x« is completely determined by its restriction to 
T, which can be computed from the weight decom- 
position [6]: 


Xalr = ` dim Vy :ea 
AE X(T) 
= S mult, À - ey [10] 
AEX(T) 
where ej, is the function on T defined by 
ex exp (£)) 2 e^*, t € t. Note that ej, —exe, and 
that eo — 1. 


Weyl Character Formula 


Theorem 19 (Weyl character formula). Let A € X,. 
Then 
A) -p 
XL(A) — E. ; = ew E\W Jeu (42) 
weW 


where, for w € W, we denote e(w)= detw consid- 
ered as a linear map t — t, and p=(1/2) Sop a 


In particular, computing the value of the character 
at point t=0 by L’Hopital’s rule, it is possible to 
deduce the following formula for the dimension of 
irreducible representations: 


I] (a, A+ p) [11] 


dim L(A) — 
= Al. (a 
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Example 9 Let G=SU(2). Then Weyl character 
formula gives, for irreducible representation II, with 
highest weight k - w, 


x^ — x (k+1) 
XI = 
mils xa 
—x x7 aeui xh, X= êy 


which implies dim II, =k + 1. 


Weyl character formula is equivalent to the follow- 
irig formula for weight multiplicities, due to Kostant: 


multroyu = >| e(w)K(w(A + p) — p — p) 
wew 


where K is Kostant's partition function: K(7) is the 
number of ways of writing 7 as a sum of positive 
roots (with repetitions). 

For classical Lie groups such as G = U(n), there are 
more explicit combinatorial formulas for weight multi- 
plicities; for U(n), the answer can be written in terms of 
the number of *Young tableaux" of a given shape. 
Details can be found in Fulton and Harris (1991). 


Tensor Product Multiplicities 


Let (7, V) be a f.d. representation of G. By complete 
reducibility, one can write V = n, L(A). The coeffi- 
cients ns are called multiplicities; finding them is an 
important problem in many applications. In parti- 
cular, a special case of this is finding the multi- 
plicities in tensor product of two unirreps: 


>, NA (v) 


Characters provide a practical tool for computing 
multiplicities: since characters of unirreps are line- 
arly independent, multiplicities can be found from 
the condition that xy = Xx1(. In particular, 


XLIX L(y) = X NS XL) 


à) & L(u) = 


Example 10 For G — SU(2), tensor product multi- 
plicities are given by 


IL, GO IIT, 一 el, 


where the sum is taken over all / such that |m — n| < 
|l € m--n,m --n-4- lis even. 


For G = U(z), there is an algorithm for finding the 
tensor product multiplicities, formulated in the 
language of Young tableaux (Littlewood-Richardson 
rule). There are also tables and computer programs 
for computing these multiplicities; some of them are 
listed in the bibliography. 


See also: Classical Groups and Homogeneous Spaces; 
Combinatorics: Overview; Equivariant Cohomology and 
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the Cartan Model; Finite Group Symmetry Breaking; Lie 
Groups: General Theory; Ljusternik-Schnirelman Theory; 
Noncommutative Geometry and the Standard Model; 
Optimal Cloning of Quantum States; Ordinary Special 
Functions; Quasiperiodic Systems; Symmetry Classes in 
Random Matrix Theory. 


Further Reading 


Bump D (2004) Lie Groups. New York: Springer. 

Brócker T and tom Dieck T (1995) Representations of Compact 
Lie Groups, Graduate Texts in Mathematics, vol. 98. 
New York: Springer. 


Fulton W and Harris J (1991) Representation Theory. New York: 
Springer. 

Knapp A (2002) Lie Groups beyond an Introduction, 2nd edn. 
Boston: Birkhaüser. 

LiE: A Computer algebra package for Lie group computations, 
available from http://young.sp2mi.univ-poitiers.fr 

McKay WG, Patera J, and Rand DW (1990) Tables of 
Representations of Simple Lie Algebras, vol. I. Exceptional 
Simple Lie Algebras. Montreal: CRM. 

Serre J-P (2001) Complex Semisimple Lie Algebras. Berlin: Springer. 

Simon B (1996) Representations of Finite and Compact Groups. 
Providence, RI: American Mathematical Society. 

Zelobenko DP (1973) Compact Lie Groups and Their Represen- 
tations. Providence, RI: American Mathematical Society. 


Compactification of Superstring Theory 


M R Douglas, Rutgers, The State University of 
New Jersey, Piscataway, NJ, USA 


© 2006 Elsevier Ltd. All rights reserved. 


Introduction 


Superstring theories and M-theory, at present the best 
candidate quantum theories which unify gravity, 
Yang-Mills fields, and matter, are directly formu- 
lated in ten and eleven spacetime dimensions. To 
obtain a candidate theory of our four-dimensional 
universe, one must find a solution of one of 
these theories whose low-energy physics is well 
described by a four-dimensional effective field theory 
(EFT), containing the well-established standard 
model (SM) of particle physics coupled to Einstein’s 
general relativity (GR). The standard paradigm for 
finding such solutions is compactification, along the 
lines originally proposed by Kaluza and Klein in the 
context of higher-dimensional general relativity. One 
postulates that the underlying D-dimensional space- 
time is a product of four-dimensional Minkowski 
spacetime, with a (D — 4)-dimensional compact and 
small Riemannian manifold K. One then finds 
that low-energy physics effectively averages over K, 
leading to a four-dimensional EFT whose field 
content and Lagrangian are determined in terms of 
the topology and geometry of K. 

Of the huge body of prior work on this subject, the 
part most relevant for string/M-theory is supergravity 
compactification, as in the limit of low energies, small 
curvatures and weak coupling, the various string 
theories and M-theory reduce to ten- and eleven- 
dimensional supergravity theories. Many of the quali- 
tative features of string/M-theory compactification, and 
a good deal of what is known quantitatively, can be 


understood simply in terms of compactification of these 
field theories, with the addition of a few crucial 
ingredients from string/M-theory. Thus, most of this 
article will restrict attention to this case, leaving many 
“stringy” topics to the articles on conformal field 
theory, topological string theory, and so on. We also 
largely restrict attention to compactifications based on 
Ricci-flat compact spaces. There is an equally important 
class in which K has positive curvature; these lead to 
anti-de Sitter (AdS) spacetimes and are discussed in the 
article on AdS/CFT (see AdS/CFT Correspondence). 
After a general review, we begin with compacti- 
fication of the heterotic string on a three complex 
dimensional Calabi-Yau manifold. This was the first 
construction which led convincingly to the SM, and 
remains one of the most important examples. We 
then survey the various families of compactifications 
to higher dimensions, with an eye on the relations 
between these compactifications which follow from 
superstring duality. We then discuss some of the 
phenomena which arise in the regimes of large 
curvature and strong coupling. In the final section, 
we bring these ideas together in a survey of the 
various known four-dimensional constructions. 


General Framework 


Let us assume we are given a D- (=d+k) dimen- 
sional field theory 7. A compactification is then a 
D-dimensional spacetime which is topologically 
the product of a d-dimensional spacetime with an 
k-dimensional manifold K, the compactification or 
"internal" manifold, carrying a Riemannian metric 
and with definite expectation values for all other 
fields in 7. These must solve the equations of motion, 
and preserve d-dimensional Poincaré invariance (or, 
perhaps another d-dimensional symmetry group). 


The most general metric ansatz for a Poincaré 
invariant compactification is 


f m», O 
Gy = ( 0 G 


where the tangent space indices are 0 € I< d 十 
R=D,0< pd, and 1<i<k Here n, is the 
Minkowski metric, Gj is a metric on K, and f is a 
real-valued function on K called the *warp factor." 

As the simplest example, consider pure 
D-dimensional GR. in this case, Einstein's equations 
reduce to Ricci flatness of Gy. Given our metric 
ansatz, this requires f to be constant, and the metric 
Gi on K to be Ricci flat. Thus, any K which admits 
such a metric, for example, the k-dimensional torus, 
will lead to a compactification. 

Typically, if a manifold admits a Ricci-flat metric, 
it will not be unique; rather there will be a moduli 
space of such metrics. Physically, one then expects 
to find solutions in which the choice of Ricci-flat 
metric is slowly varying in d-dimensional spacetime. 
General arguments imply that such variations 
must be described by variations of d-dimensional 
fields, governed by an EFT. Given an explicit 
parametrization of the family of metrics, say 
G,(¢*) for some parameters $^, in principle the 
EFT could be computed explicitly by promoting 
the parameters to d-dimensional fields, substituting 
this parametrization into the D-dimensional action, 
and expanding in powers of the d-dimensional 
derivatives. In pure GR, we would find the four- 
dimensional effective Lagrangian 


car= | d*y,/det G(9) R9 


VEM 7 REN 
+ y det G(4)G" (9)G" (9) 577 555 Ou On” 


x vs [1] 


While this is easily evaluated for K a symmetric space 
or torus, in general a direct computation of CEFT is 
impossible. This becomes especially clear when one 
learns that the Ricci-flat metrics Gj are not explicitly 
known for the examples of interest. Nevertheless, 
clever indirect methods have been found that give a 
great deal of information about CFFri this is much of 
the art of superstring compactification. However, in 
this section, let us ignore this point and continue as if 
we could do such computations explicitly. 

Given a solution, one proceeds to consider its 
small perturbations, which satisfy the linearized 
equations of motion. If these include exponentially 
growing modes (often called *tachyons"), the solu- 
tion is unstable. (Note that this criterion is modified 
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for AdS compactifications). The remaining perturba- 
tions can be divided into massless fields, correspond- 
ing to zero modes of the linearized equations of 
motion on K, and massive fields, the others. General 
results on eigenvalues of Laplacians imply that the 
masses of massive fields depend on the diameter of 
K as m ~ 1/diam(K), so at energies far smaller than 
m, they cannot be excited (this is not universal; 
given strong negative curvature on K, or a rapidly 
varying warp factor, one can have perturbations of 
small nonzero mass). Thus, the massive fields can be 
"integrated out," to leave an EFT with a finite 
number of fields. In the classical approximation, this 
simply means solving their equations of motion in 
terms of the massless fields, and using these 
solutions to eliminate them from the action. At 
leading order in an expansion around a solution, 
these fields are zero and this step is trivial; never- 
theless, it is useful in making a systematic definition 
of the interaction terms in the EFT. 

As we saw in pure GR, the configuration space 
parametrized by the massless fields in the EFT, is the 
moduli space of compactifications obtained by 
deforming the original solution. Thus, from a 
mathematical point of view, low-energy EFT can 
be thought of as a sort of enhancement of the 
concept of moduli space, and a dictionary set up 
between mathematical and physical languages. To 
give its next entry, there is a natural physical metric 
on moduli space, defined by restriction from the 
metric on the configuration space of the theory 7; 
this becomes the sigma-model metric for the scalars 
in the EFT. Because the theories 7 arising from 
string theory are geometrically natural, this metric is 
also natural from a mathematical point of view, and 
one often finds that much is already known about it. 
For example, the somewhat fearsome two derivative 
terms in eqn [1], are (perhaps) less so when one 
realizes that this is an explicit expression for the 
Weil-Petersson metric on the moduli space of Ricci- 
flat metrics. In any case, knowing this dictionary is 
essential for taking advantage of the literature. 

Another important entry in this dictionary is that 
the automorphism group of a solution translates 
into the gauge group in the EFT. This can be either 
continuous, leading to the gauge symmetry of 
Maxwell and Yang-Mills theories, or discrete, 
leading to discrete gauge symmetry. For example, if 
the metric on K has continuous isometry group G, 
the resulting EFT will have gauge symmetry G, as in 
the original example of Kaluza and Klein with K = S! 
and G = U(1). Mathematically, these phenomena 
of *enhanced symmetry" are often treated using the 
languages of equivariant theories (cohomology, 
K-theory, etc.), stacks, and so on. 


588 Compactification of Superstring Theory 


To give another example, obstructed deformations 
(solutions of the linearized equations which do not 
correspond to elements of the tangent space of the 
true moduli space) correspond to scalar fields which, 
while massless, appear in the effective potential in a 
way which prevents giving them expectation values. 
Since the quadratic terms V” are masses, this 
dependence must be at cubic or higher order. 

While the preceding concepts are general and apply 
to compactification of all local field theories, string 
and M-theory add some particular ingredients to this 
general recipe. In the limits of small curvatures and 
weak coupling, string and M-theory are well described 
by the ten- and 11-dimensional supergravity theories, 
and thus the string/M-theory discussion usually starts 
with Kaluza-Klein compactification of these theories, 
which we denote I, Ila, IIb, HE, HO and M. Let us 
now discuss a particular example. 


Calabi-Yau Compactification 
of the Heterotic String 


Contact with the SM requires finding compactifications 
to d — 4 either without supersymmetry, or with at most 
N — 1 supersymmetry, because the SM includes chiral 
fermions, which are incompatible with N > 1. Let us 
start with the Eg x Eg heterotic string or “HE” theory. 
This choice is made rather than HO because only in this 
case can we find the SM fermion representations as 
subrepresentations of the adjoint of the gauge group. 

Besides the metric, the other bosonic fields of the HE 
supergravity theory are a scalar ® called the dilaton, 
Yang-Mills gauge potentials for the group G 三 Eg x 
Eg, and a 2-form gauge potential B (often called the 
*Neveu-Schwarz" or *NS" 2-form) whose defining 
characteristic is that it minimally couples to the 
heterotic string world-sheet. We will need their gauge 
field strengths below: for Yang-Mills, this is a 2-form 
Fi, with a indexing the adjoint of Lie G, and for the NS 
2-form this is a 3-form Hj. Denoting the two 
Majorana-Weyl spinor representations of SO(1, 9) as 
S and C, then the fermions are the gravitino v; € 
$& V, a spin 1/2 *dilatino" A € C, and the adjoint 
gauginos x^ € S. We use I’; to denote Dirac matrices 
contracted with a “zehnbein,” satisfying (Lj, Dj] — 
2G, and INT = (1/2)(T;, Lr. etc. 

A local supersymmetry transformation with para- 
meter e is then 


pr = Dye + LH yl! e [2] 
6\ = 0,01" — BHIE [3] 
6x? = Fyre (4] 


We now assume N = 1 supersymmetry. An unbroken 
supersymmetry is a spinor e for which the left-hand 
side is zero, so we seek compactifications with a 
unique solution of these equations. 

We first discuss the case H =0. Setting óvy, in 
eqn [2] to zero, we find that the warp factor f must 
be constant. The vanishing of bw; requires € to be a 
covariantly constant spinor. For a six-dimensional 
M to have a unique such spinor, it must have SU(3) 
holonomy; in other words, M must be a Calabi-Yau 
manifold. In the following, we use basic facts about 
their geometry. 

The vanishing of 6A then requires constant dilaton 
o, while the vanishing of ôx? requires the gauge field 
strength F to solve the hermitian Yang-Mills 
equations, 


F20 = Fo "T Fl = 


By the theorem of Donaldson and Uhlenbeck-Yau, 
such solutions are in one-to-one correspondence 
with -stable holomorphic vector bundles with 
structure group H contained in the complexification 
of G. Choose such a bundle E; by the general 
discussion above, the commutant of H in G will be 
the automorphism group of the connection on E and 
thus the low-energy gauge group of the resulting 
EFT. For example, since Eg has a maximal Ee x 
SU(3) subgroup, if E has structure group H — SL(3), 
there is an embedding such that the unbroken gauge 
symmetry is Ee x Es, realizing one of the standard 
grand unified groups E, as a factor. 

The choice of E is constrained by anomaly 
cancellation. This discussion (Green e£ al. 1987) 
modifies the Bianchi identity for H to 


] 1 a 
dH SR ARs P ^F [5] 


where R is the matrix of curvature 2-forms. The 
normalization of the F AF term is such that if we 
take E = TK the holomorphic tangent bundle of K, 
with isomorphic connection, then using the embed- 
ding we just discussed, we obtain a solution of eqn 
[5] with H — 0. 

Thus, we have a complete solution of the 
equations of motion. General arguments imply that 
supersymmetric Minkowski solutions are stable, so 
the small fluctuations consist of massless and 
massive fields. Let us now discuss a few of the 
massless fields. Since the EFT has N=1 super- 
symmetry, the massless scalars live in chiral multi- 
plets, which are local coordinates on a complex 
Kahler manifold. 

First, the moduli of Ricci-flat metrics on K will 
lead to massless scalar fields: the complex structure 


moduli, which are naturally complex, and Kahler 
moduli, which are not. However, in string compac- 
tification the latter are complexified to the periods of 
the 2-form B + iJ integrated over a basis of H;(K, Z), 
where / is the Káhler form and B is the NS 2-form. In 
addition, there is a complex field pairing the dilaton 
(actually, exp(—®)) and the “model-independent 
axion," the scalar dual in d —4 to the 2-form B,,. 
Finally, each complex modulus of the holomorphic 
bundle E will lead to a chiral multiplet. Thus, the 
total number of massless uncharged chiral multiplets 
is 1 + b^ (K) + p^ (K) + dim H'(K, End (E)). 

Massless charged matter will arise from zero 
modes of the gauge field and its supersymmetric 
partner spinor x^. It is slightly easier to discuss the 
spinor, and then appeal to supersymmetry to get the 
bosons. Decomposing the spinors of SO(6) under 
SU(3), one obtains (0,p) forms, and the Dirac 
equation becomes the condition that these forms 
are harmonic. By the Hodge theorem, these are in 
one-to-one correspondence with classes in Dolbeault 
cohomology H®?(K,V), for some bundle V. The 
bundle V is obtained by decomposing the spinor into 
representations of the holonomy group of E. For 
H =SU(3), the decomposition of the adjoint under 
the embedding of SU(3) x Eg in Eg, 


248 = (8,1) + (1,78) + (3,27) + (3,27) [6] 


implies that charged matter will form “generations” 
in the 27, of number dim H?:! (K, E), and “antigene- 
rations” in the 27, of number dim H?!(K, E) — 
dim H??(K, E). The difference in these numbers is 
determined by the Atiyah-Singer index theorem to be 


Neen = N57 m N5; 一 3c3(E) 


In the special case of E = TK, these numbers are 
separately determined to be No7=b!! and 
N5; — b^, so their difference is x(K)/2, half the 
Euler number of K. In the real world, this number is 
Neen = 3, and matching this under our assumptions 
so far is very constraining. 

Substituting these zero modes into the ten- 
dimensional Yang-Mills action and integrating, one 
can derive the d —4 EFT. For example, the cubic 
terms in the superpotential, usually called Yukawa 
couplings after the corresponding fermion-boson 
interactions in the component Lagrangian, are 
obtained from the cubic product of zero modes 


f O A tr(ġ1 ^ $2 ^ ġ3) 
K 
where Q is the holomorphic ġ; € H®>!(K,Rep E) are 


the zero modes, and tr arises from decomposing the 
Eg cubic group invariant. 
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Note the very important fact that this expression 
only depends on the cohomology classes of the ó; 
(and Q). This means the Yukawa couplings can be 
computed without finding the explicit harmonic 
representatives, which is not possible (we do not 
even know the explicit metric). More generally, one 
expects to be able to explicitly compute the super- 
potential and all other holomorphic quantities in 
the effective Lagrangian solely from “topological” 
information (the Dolbeault cohomology ring, and 
its generalizations within topological string theory). 
On the other hand, computing the Káhler metric in 
an N=1 EFT is usually out of reach as it would 
require having explicit normalized zero modes. 
Most results for this metric come from considering 
closely related compactifications with extended 
supersymmetry, and arguing that the breaking 
to N—1 supersymmetry makes small corrections 
to this. 

There are several generalizations of this construc- 
tion. First, the necessary condition to solve eqn [5] is 
that the left-hand side be exact, which requires 


CE) = e( TK) [7] 


This allows for a wide variety of E's to be used, so 
that Neen —3 can be attained with many more K's. 
This class of models is often called *(0, 2) compacti- 
fication" to denote the world-sheet supersymmetry 
of the heterotic string in these backgrounds. One can 
also use bundles with larger structure group; for 
example, H — SL(4) leads to unbroken SO(10) x Eg, 
and H —SL(5) leads to unbroken SU(5) x Eg. 

The subsequent breaking of the grand unified 
group to the SM gauge group is typically done by 
choosing K with nontrivial 7, so that it admits a 
flat line bundle W with nontrivial holonomy 
(usually called a “Wilson line”). One then uses the 
bundle E & W in the above discussion, to obtain the 
commutant of H & W as gauge group. For example, 
if 71(K) S Zs, one can use W whose holonomy is an 
element of order 5 in SU(5), to obtain as commutant 
the SM gauge group SU(3) x SU(2) x U(1). 

Another generalization is to take the 3-form H Æ 0. 
This discussion begins by noting that, for super- 
symmetry, we still require the existence of a unique 
spinor e; however, it will no longer be covariantly 
constant in the Levi-Civita connection. One way to 
structure the problem is to note that the right-hand 
side of eqn [2] takes the form of a connection with 
torsion; the resulting equations have been discussed 
mathematically in (Li and Yau 2004). 

Another recent approach to these compactifica- 
tions (Gauntlett 2004) starts out by arguing that « 
cannot vanish on K, so it defines a weak SU(3) 
structure, a local reduction of the structure group of 
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T K to SU(3) which need not be integrable. This 
structure must be present in all N=1,d=4 super- 
symmetric compactifications and there are hopes 
that it will lead to a useful classification of the 
possible local structures and corresponding partial 
differential equations (PDEs) on K. 


Higher-Dimensional and Extended 
Supersymmetric Compactifications 


While there are similar quasirealistic constructions 
which start from the other string theories and 
M-theory, before we discuss these, let us give an 
overview of compactifications with N > 2 super- 
symmetry in four dimensions, and in higher dimen- 
sions. These are simpler analog models which can be 
understood in more depth; their study led to one of 
the most important discoveries in string/M-theory, 
the theory of superstring duality. 

As before, we require a covariantly constant 
spinor. For Ricci-flat K with other background 
fields zero, this requires the holonomy of K to be 
one of trivial, SU(m), Sp(z), or the exceptional 
holonomies G2 or Spin(7). In Table 1 we tabulate 
the possibilities with spacetime dimension d greater 
or equal to 3, listing the supergravity theory, the 
holonomy type of K, and the type of the resulting 
EFT: dimension d, total number of real super- 
symmetry parameters Ns, and the number of spinor 
supercharges N (in d=6, since left- and right- 
chirality Majorana spinors are inequivalent, there 
are two numbers). 

The structure of the resulting supergravity EFTs is 
heavily constrained by Ns. We now discuss the 
various possibilities. 


Table 1 String/M-theories, holonomy groups and the resulting 
supersymmetry 
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Given the supersymmetry algebra, if such a super- 
gravity exists, it is unique. Thus, toroidal compac- 
tifications of d=11 supergravity, Ila and IIb 
supergravity lead to the same series of maximally 
supersymmetric theories. Their structure is gov- 
erned by the exceptional Lie algebra F,;_4; the 
gauge charges transform in a fundamental repre- 
sentation of this algebra, while the scalar fields 
parametrize a coset space G/H, where G is the 
maximally split real form of the Lie group E, 4, 
and H is a maximal compact subgroup of G. 
Nonperturbative duality symmetries lead to a 
further identification by a maximal discrete sub- 
group of G. 


Ns — 16 


This supergravity can be coupled to maximally 
supersymmetric super Yang-Mills theory, which 
given a choice of gauge group G is unique. Thus, 
these theories (with zero cosmological constant and 
thus allowing  super-Poincaré symmetry) are 
uniquely determined by the choice of G. 

In d —10, the choices Eg x Eg and Spin(32)/Z> 
which arise in string theory, are almost uniquely 
determined by the Green-Schwarz anomaly cancel- 
lation analysis. Compactification of these HE, HO 
and type I theories on T" produces a unique theory 
with moduli space 


R* x SO(n,n + 16; Z)NSO(n, n + 16; R)/SO(n, R) 
x SO(n -- 16, R) i8] 


In Kaluza-Klein (KK) reduction, this arises from the 
choice of metric gj, the antisymmetric tensor Bi and 
the choice of a flat Eg x Eg or Spin(32)/Z2 connec- 
tion on T”, while a more unified description follows 
from the heterotic string world-sheet analysis. Here 
the group SO(z, n + 16) is defined to preserve an even 
self-dual quadratic form 7 of signature (n,n + 16); 
for example, 7 — (- Eg) 6 (CEg) 616 I 6 I, where I 
is the form of signature (1,1) and Eg is the Cartan 
matrix. In fact, all such forms are equivalent under 
orthogonal integer similarity transformation; so, 
the resulting EFT is unique. It has a rank 16 + 2n 
gauge group, which at generic points in moduli 
space is U(1)'°t?", but is enhanced to a nonabelian 
group G at special points. To describe G, we first 
note that a point p in moduli space determines an 
n-dimensional subspace V, of Riot" and 
an orthogonal subspace V; (of varying dimen- 
sion). Lattice points of length squared —2 con- 
tained in V, then correspond to roots of the Lie 
algebra of Gp. 


The other compactifications with Ns=16 is 
M-theory on K3 and its further toroidal reductions, 
and IIb on K3. M-theory compactification to d — 7 
is dual to heterotic on T?, with the same moduli 
space and enhanced gauge symmetry. As we discuss 
at the end of the section “Stringy and quantum 
corrections," the extra massless gauge bosons of 
enhanced gauge symmetry are M2 branes wrapped 
on 2-cycles with topology $?. For such a cycle to 
have zero volume, the integral of the Kahler form 
and holomorphic 2-form over the cycle must vanish; 
expressing this in a basis for H*(K3,R) leads to 
exactly the same condition we discussed for 
enhanced gauge symmetry above. The final result is 
that all such K3 degenerations lead to one- of the 
two-dimensional canonical singularities, of types A, 
D or E, and the corresponding EFT phenomenon is 
the enhanced gauge symmetry of corresponding 
Dynkin type A, D, or E. 

IIb on K3 is similar, but reducing the self-dual 
Ramond-Ramond (RR) 4-form potential on the 2- 
cycles leads to self-dual tensor multiplets instead of 
Maxwell theory. The moduli space is eqn [8] but 
with n= 5, not 1 —4, incorporating periods of RR 
potentials and the SL(2, Z) duality symmetry of IIb 
theory. 

One may ask if the Ns— 16 I/HE/HO theories in 
d —8 and d —9 have similar duals. For d — 8, these 
are obtained by a pretty construction known as 
*F-theory." Geometrically, the simplest definition of 
F-theory is to consider the special case of M-theory 
on an elliptically fibered Calabi-Yau, in the limit 
that the Kahler modulus of the fiber becomes small. 
One check of this claim for d — 8 is that the moduli 
space of elliptically fibered K3s agrees with eqn [8] 
with 7» — 2. 

Another definition of F-theory is the particular 
case of IIb compactification using Dirichlet 
7-branes, and orientifold 7-planes. This construction 
is T-dual to the type I theory on T?, which provides 
its simplest string theory definition. As discussed in 
Polchinski (1999), one can think of the open strings 
giving rise to type I gauge symmetry as living on 32 
Dirichlet 9-branes (or D9-branes) and an orientifold 
nineplane. T-duality converts Dirichlet and orienti- 
fold p-branes to (p — 1)-branes; thus this relation 
follows by applying two T-dualities. 

These compactifications can also be parametrized 
by elliptically fibered Calabi-Yaus, where K is the 
base, and the branes correspond to singularities of 
the fibration. The relation between these two 
definitions follows fairly simply from the duality 
between M-theory on T*, and IIb string on S'. There 
is a partially understood generalization of this 
tod=9. 
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Finally, these constructions admit further discrete 
choices, which break some of the gauge symmetry. 
The simplest to explain is in the toroidal compacti- 
fication of I/HE/HO. The moduli space of theories 
we discussed uses flat connections on the. torus 
which are continuously connected to the trivial 
connection, but in general the moduli space of flat 
connections has other components. The simplest 
example is the moduli space of flat Eg x Eg 
connections on $!, which has a second component 
in which the holonomy exchanges the two Eg's. On 
T?, there are connections for which the holonomies 
cannot be simultaneously diagonalized. This struc- 
ture and the M-theory dual of these choices is 
discussed in (de Boer et al. 2001). 


Ns =8, d< 6 


Again, the gravity multiplet is uniquely determined, 
so the most basic classification is by the gauge group 
G. The full low-energy EFT is determined by the 
matter content and action, and there are two types 
of matter multiplets. First, vector multiplets contain 
the Yang-Mills fields, fermions and 6 — d scalars; 
their action is determined by a prepotential which is 
a G-invariant function of the fields. Since the vector 
multiplets contain massless adjoint scalars, a generic 
vacuum in which these take nonzero distinct 
vacuum expectation values (VEVs) will have U(1)’ 
gauge symmetry, the commutant of G with a generic 
matrix (for d < 5, while there are several real 
scalars, the potential forces these to commute in a 
supersymmetric vacuum). Vacua with this type of 
gauge symmetry breaking, which does not reduce 
the rank of the gauge group, are usually referred to 
as on a “Coulomb branch" of the moduli space. To 
summarize, this sector can be specified by my, the 
number of vector multiplets, and the prepotential F, 
a function of the zy VEVs which is cubic in d — 5, 
and holomorphic in d — 4. 

Hypermultiplets contain scalars which parame- 
trize a quaternionic Kahler manifold, and partner 
fermions. Thus, this sector is specified by a 4754 real 
dimensional quaternionic Kahler manifold. The G 
action comes with triholomorphic moment maps; if 
nontrivial, VEVs in this sector can break gauge 
symmetry and reduce it in rank. Such vacua are 
usually referred to as on a *Higgs branch." 

The basic example of these compactifications is 
M-theory on a Calabi-Yau 3-fold (CY3). Reduction 
of the 3-form leads to h''(K) vector multiplets, 
whose scalar components are the CY Kahler moduli. 
The CY complex structure moduli pair with periods 
of the 3-form to produce b^!(K) hypermultiplets. 
Enhanced gauge symmetry then appears when the 
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CY; contains ADE singularities fibered over a curve, 
from the same mechanism involving wrapped M2 
branes we discussed under Ns — 16. If degenerating 
curves lead to other singularities (e.g., the ODP or 
“conifold”), it is possible to obtain extremal transi- 
tions which translate physically into Coulomb-Higgs 
transitions. Finally, singularities in which surfaces 
degenerate lead to nontrivial fixed-point theories. 

Reduction on $! leads to Ila on CY3, with the 
spectrum above plus a “universal hypermultiplet” 
which includes the dilaton. Perhaps the most 
interesting new feature is the presence of world- 
sheet instantons, which correct the metric on vector 
multiplet moduli space. This metric satisfies the 
restrictions of special geometry and thus can be 
derived from a prepotential. 

The same theory can be obtained by compactifi- 
cation of IIb theory on the mirror CY3. Now vector 
multiplets are related to the complex structure 
moduli space, while hypermultiplets are related to 
Kahler moduli space. In this case, the prepotential 
derived from variation of complex structure receives 
no instanton corrections, as we discuss in the next 
section. 

Finally, one can compactify the heterotic string on 
K3 x T$-4, but this theory follows from toroidal 
reduction of the d — 6 case we discuss next. 


Ns = 8, d=6 


These supergravities are similar to d < 6, but there 
is a new type of matter multiplet, the self-dual 
tensor (in d < 6 this is dual to a vector multiplet). 
Since fermions in d=6 are chiral, there is an 
anomaly cancellation condition relating the numbers 
of the three types of multiplets (Aspinwall 1996, 
section 6.6), 


ny 一 ny + 29nq — [9] 


One class of examples is the heterotic string 
compactified on K3. In the original perturbative 
constructions, to satisfy eqn [7], we need to choose a 
vector bundle with c;(V) — x(K3) — 24. The result- 
ing degrees of freedom are a single self-dual tensor 
multiplet and a rank-16 gauge group. More gen- 
erally, one can introduce Nsg heterotic 5-branes, 
which generalize eqn [7] to c2(E) + Nsg = c2(TK). 
Since this brane carries a self-dual tensor multiplet, 
this series of models is parametrized by my. They are 
connected by transitions in which an Eg instanton 
shrinks to zero size and becomes a 5-brane; the 
resulting decrease in the dimension of the moduli 
space of Eg bundles on K3 agrees with eqn [9]. 
Another class of examples is F-theory on an 
elliptically fibered CY3. These are related to 


M-theory on an elliptically fibered CY; in the same 
general way we discussed under Ns=16. The 
relation between F-theory and the heterotic string 
on K3 can be seen by lifting M-theory-heterotic 
duality; this suggests that the two constructions are 
dual only if the CY; is a K3 fibration as well. Since 
not all elliptically fibered CY3s are K3 fibered, the 
F-theory construction is more general. 

We return to d=4 and Ns=4 in the final section. 
The cases of Ns < 4 which exist in d < 3 are far less 
studied. 


Stringy and Quantum Corrections 


The D-dimensional low-energy effective supergrav- 
ity actions on which we based our discussion so far 
are only approximations to the general story of 
string/M-theory compactification. However, if 
Planck's constant is small, K is sufficiently large, 
and its curvature is small, then they are controlled 
approximations. 

In M-theory, as in any theory of quantum gravity, 
corrections are controlled by the Planck scale 
parameter MD, which sits in front of the Einstein 
term of the D-dimensional effective Lagrangian, and 
plays the role of 5. In general, this is different from 
the four-dimensional Planck scale, which satisfies 
M3,=Vol(K)MP~*. After taking the low-energy 
limit E « Mp, the remaining corrections are con- 
trolled by the dimensionless parameters /p/ R, where 
R can any characteristic length scale of the solution: 
a curvature radius, the length of a nontrivial cycle, 
and so on. 

In string theory, one usually thinks of the 
corrections as a double series expansion in g,, the 
dimensionless (closed) string coupling constant, and 
a’, the inverse string tension parameter, of dimen- 
sions (length)?. The ten-dimensional Planck scale is 
related to these parameters as M$ = 1/g2(o/)*, up to 
a constant factor that depends on conventions. 

Besides perturbative corrections, which have power- 
like dependence on these parameters, there can be 
world sheet and “brane” instanton corrections. For 
example, a string world sheet can wrap around a 
topologically nontrivial spacelike 2-cycle © in K, 
leading to an instanton correction to the effective 
action which is suppressed as exp(—Vol(%)/27a’). 
More generally, any p-brane wrapping a p-cycle 
can produce a similar effect. As for which terms in 
the effective Lagrangian receive corrections, this 
depends largely on the number and symmetries of 
the fermion zero modes on the instanton world 
volumes. 

Let us start by discussing some cases in which one 
can argue that these corrections are not present. 


First, extended supersymmetry can serve to elim- 
inate many corrections. This is analogous to the 
familiar result that the superpotential in d —4, N — 1 
supersymmetric field theory does not receive (or “is 
protected from") perturbative corrections, and in 
many cases follows from similar formal arguments. 
In particular, supersymmetry forbids corrections to 
the potential and two derivative terms in the 
Ns — 32 and Ns — 16 Lagrangians. 

In Ns — 8, the superpotential is protected, but the 
two derivative terms can receive corrections. How- 
ever, there is a simple argument which precludes 
many such corrections — since vector multiplet and 
hypermultiplet moduli spaces are decoupled, a 
correction whose control parameter sits in (say) a 
vector multiplet, cannot affect hypermultiplet mod- 
uli space. This fact allows for many exact computa- 
tions in these theories. 

As an example, in IIb on CYs, the metric on 
vector multiplet moduli space is precisely eqn [1] as 
obtained from supergravity (in other words, the 
Weil-Petersson metric on complex structure moduli 
space). First, while in principle it could have been 
corrected by world-sheet instantons, since these 
depend on Kahler moduli which sit in hypermulti- 
plets, it is not. The only other instantons with the 
requisite zero modes to modify this metric are 
wrapped Dirichlet branes. Since in IIb theory these 
wrap even-dimensional cycles, they also depend on 
Kahler moduli and thus leave vector moduli space 
unaffected. 

As previously discussed, for K3-fibered CY3, this 
theory is dual to the heterotic string on K3 x T?. 
There, the vector multiplets arise from Wilson lines 
on T?, and reduce to an adjoint multiplet of N — 2 
supersymmetric Yang-Mills theory. Of course, in 
the quantum theory, the metric on this moduli space 
receives instanton corrections. Thus, the duality 
allows deriving the exact moduli space metric, and 
many other results of the Seiberg-Witten theory of 
N=2 gauge theory, as aspects of the geometry of 
Calabi-Yau moduli space. 

In Ns—4, only the superpotential is protected, 
and that only in perturbation theory; it can receive 
nonperturbative corrections. Indeed, it appears that 
this is fairly generic, suggesting that the effective 
potentials in these theories are often sufficiently 
complicated to exhibit the structure required for 
supersymmetry breaking and the other symmetry 
breakings of the SM. Understanding this is an active 
subject of research. 

We now turn from corrections to novel physical 
phenomena which arise in these regimes. While this 
is too large a subject to survey here, one of the basic 
principles which governs this subject is the idea that 
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string/M-theory compactification on a singular 
manifold K is typically consistent, but has new 
light degrees of freedom in the EFT, not predicted 
by KK arguments. We implicitly touched on one 
example of this in the discussion of M-theory 
compactification on K3 above, as the space of 
Ricci-flat K3 metrics has degeneration limits in 
which curvatures grow without bound, while the 
volumes of 2-cycles vanish. On the other hand, the 
structure of Ns=16 supersymmetry essentially 
forces the d=7 EFT in these limits to be non- 
singular. Its only noteworthy feature is that a 
nonabelian gauge symmetry is restored, and thus 
certain charged vector bosons and their superpart- 
ners become massless. 

To see what is happening microscopically, we 
must consider an M-theory membrane (or 2-brane), 
wrapped on a degenerating 2-cycle. This appears as 
a particle in d=7, charged under the vector 
potential obtained by reduction of the D=11 
3-form potential. The mass of this particle is the 
volume of the 2-cycle multiplied by the membrane 
tension, so as this volume shrinks to zero, the 
particle becomes massless. Thus, the physics is also 
well defined in 11 dimensions, though not literally 
described by 11-dimensional supergravity. 

This phenomenon has numerous generalizations. 
Their common point is that, since the essential 
physics involves new light degrees of freedom, they 
can be understood in terms of a lower-dimensional 
quantum theory associated with the region around 
the singularity. Depending on the geometry of the 
singularity, this is sometimes a weakly coupled field 
theory, and sometimes a nontrivial conformal field 
theory. Occasionally, as in IIb on K3, the lightest 
wrapped brane is a string, leading to a “little string 
theory” (Aharony 2000). 


N=1 Supersymmetry in Four Dimensions 


Having described the general framework, we con- 
clude by discussing the various constructions which 
lead to N=1 supersymmetry. Besides the heterotic 
string on a CY3, these compactifications include 
type IIa and IIb on orientifolds of CY3, the related 
F-theory on elliptically fibered Calabi-Yau 4-folds 
(CY4), and M-theory on G2 manifolds. Let us briefly 
spell out their ingredients, the known nonperturbative 
corrections to the superpotential, and the duality 
relations between these constructions. 

To start, we recap the heterotic string construc- 
tion. We must specify a CY3K, and a bundle E over 
K which admits a Hermitian Yang-Mills connec- 
tion. The gauge group G is the commutant of the 
structure group of E in Eg x Eg or Spin(32)/Z5, 
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while the chiral matter consists of metric moduli of 
K, and fields corresponding to a basis for the 
Dolbeault cohomology group H®!(K, Rep E) where 
Rep E is the bundle E embedded into an Eg bundle 
and decomposed into G-reps. 

There is a general (though somewhat formal) 
expression for the superpotential, 


W= [ a^ (40A 34?) 
+ /oaAae + Wyp [10] 


The first term is the holomorphic Chern-Simons 
action, whose variation enforces the F?? — 0 condi- 
tion. The second is the *flux superpotential," while 
the third term is the nonperturbative corrections. 
The best understood of these arise from super- 
symmetric gauge theory sectors. In some, but not all, 
cases, these can be understood as arising from gauge 
theoretic instantons, which can be shown to be dual 
to heterotic 5-branes wrapped on K. Heterotic 
world-sheet instantons can also contribute. 

The HO theory is $-dual to the type I string, with 
the same gauge group, realized by open strings on 
Dirichlet 9-branes. This construction involves essen- 
tially the same data. The two classes of heterotic 
instantons are dual to D1- and D5-brane instantons, 
whose world-sheet theories are somewhat simpler. 

If the CY; K has a fibration by tori, by applying 
T-duality to the fibers along the lines discussed for 
tori under Ns — 16 above, one obtains various type II 
orientifold compactifications. On an elliptic fibra- 
tion, double T-duality produces a IIb compactifica- 
tion with D7s and O7s. Using the relation between 
IIb theory on T? and F-theory on K3 fiberwise, one 
can also think of this as an F-theory compactifica- 
tion on a K3-fibered CY4. More generally, one 
can compactify F theory on any elliptically fibered 
4-fold to obtain N=1. These theories have 
D3-instantons, the T-duals of both the type I 
D1- and D5-brane instantons. 

The theory of mirror symmetry predicts that all 
CY3s have T? fibration structures. Applying the 
corresponding triple T-duality, one obtains a Ila 
compactification on the mirror CY3 K, with D6- 
branes and O6-planes. Supersymmetry requires 
these to wrap special Lagrangian cycles in K. As in 
all Dirichlet brane constructions, enhanced gauge 
symmetry arises from coincident branes wrapping 
the same cycle, and only the classical groups are 
visible in perturbation theory. Exceptional gauge 
symmetry arises as a strong coupling phenomenon 
of the sort described in the previous section. The 
superpotential can also be thought of as mirror to 
eqn [10], but now the first term is the sum of a real 


Chern-Simons action on the special Lagrangian 
cycles, with disk world-sheet instanton corrections, 
as studied in open string mirror symmetry. The 
gauge theory instantons are now D2-branes. 

Using the duality relation between the IIa string and 
11-dimensional M-theory, this construction can be 
lifted to a compactification of M-theory on a seven- 
dimensional manifold L, which is an S! fibration over 
K. The D6 and O6 planes arise from singularities in the 
S! fibration. Generically, L can be smooth, and the 
only candidate in Table 1 for such an N—1 
compactification is a manifold with G2 holonomy; 
therefore, L must have such holonomy. Finally, both 
the Ila world-sheet instantons and the D2-brane 
instantons lift to membrane instantons in M-theory. 

This construction implicitly demonstrates the exis- 
tence of a large number of G2 holonomy manifolds. 
Another way to arrive at these is to go back to the 
heterotic string on K, and apply the duality (again 
under Ns = 16) between heterotic on T? and M-theory 
on K3 to the T? fibration structure on K, to arrive at 
M-theory on a K3-fibered manifold of G2 holonomy. 
Wrapping membranes on 2-cycles in these fibers, we 
can see enhanced gauge symmetry in this picture fairly 
directly. It is an illuminating exercise to work through 
its dual realizations in all of these constructions. 

Our final construction uses the interpretation of the 
strong coupling limit of the HE theory as M-theory on 
a one-dimensional interval J, in which the two Eg 
factors live on the two boundaries. Thus, our original 
starting point can also be interpreted as the heterotic 
string on K x I. This construction is believed to be 
important physically as it allows generalizing a 
heterotic string tree-level relation between the gauge 
and gravitational couplings which is phenomenologi- 
cally disfavored. One can relate it to a IIa orientifold as 
well, now with D8- and O8-branes. 

These multiple relations are often referred to as the 
*web" of dualities. They lead to numerous relations 
between compactification manifolds, moduli spaces, 
superpotentials, and other properties of the EFTs, 
whose full power has only begun to be appreciated. 


Suggestions for further reading 


Original references for all but the most recent of 
these topics can be found in the following textbooks 
and proceedings. We have also referenced a few 
research articles which are good starting points for 
the more recent literature. There are far more 
reviews than we could reference here, and a partial 
listing of these appears at http://www.slac.stanford. 
edu/spires/reviews/ 


See also: Brane Construction of Gauge Theories; 
Random Algebraic Geometry, Attractors and Flux Vacua; 


String Theory: Phenomenology; Superstring Theories; 
Two-Dimensional Conformal Field Theory and Vertex 
Operator Algebras; Viscous Incompressible Fluids: 
Mathematical Theory. 
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Introduction 


The Euler equations for compressible fluids consist of 
the conservation laws of mass, momentum, and energy: 


Oo + Vx-m = 0, x € R? [1] 

am Va: (EM) + vp =0 (2) 
m 

O,E+ Vx: M) `‘ [3] 


Equivalently, these correspond to the general form of 
nonlinear hyperbolic systems of conservation laws: 


Qu-cV.:f(u)-0, ER MER [4 


System [1]-[3] is closed by the following constitutive 
relations: 


p = p(p.e), p Se ype [5] 


In [1]-[3] and [5], 7—1/p is the deformation 
gradient (specific volume for fluids, strain for 
solids), v—(vi,...,v4) is the fluid velocity with 
pv-m the momentum vector, p is the scalar 
pressure, and E is the total energy with e the 
internal energy which is a given function of (7, p) or 
(p,p) defined through thermodynamical relations. 
The other two thermodynamic variables are tem- 
perature 0 and entropy S. If (p,S) are chosen as 
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1 Compressible Flows: Mathematical Theory 


independent variables, then the constitutive relations 
can be written as 


(e, p,0) = (e(p,S), p(p, S), A(p, S)) [6] 


governed by 60 d$—de-- pdr =de — pdp/p^. For 
polytropic gases, 


p = p(p,S) = npe" 


= p 
76-17» [7] 
PEE 
i 


where 及 > 0 may be taken to be the universal gas 
constant divided by the effective molecular weight of 
the particular gas, c, > 0 is the specific heat at constant 
volume, ^ — 1 + R/c, > 1 is the adiabatic exponent, 
and « can be any positive constant under scaling. 

The most important criterion of applicability of 
any mathematical model is its well-posedness: 
existence, uniqueness, and stability. The well-posedness 
theory for compressible fluid flows is far from being 
complete, and many further issues are still unexplored. 
In particular, the global existence and uniqueness of 
solutions in R4, d > 2, is stilla major open problem, and 
only partial results shed some lights on the amazing 
complexity of the problem. Below, we will mainly focus 
on the well-posedness issues with emphasis on the 
Cauchy problem, the initial value problem: 


ul, o = Uo [8] 


first for inviscid compressible fluid flows and then 
for viscous compressible fluid flows. 

Throughout this article, where a cited reference is 
not shown in the “Further reading” section, it may 
usually be found by consulting Bressan (2000), 
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Chen (2005), Dafermos (2005), Feireisl (2004), 
Lions (1986, 1988) or Liv (2000). 


Inviscid Compressible Fluid Flows: 
Euler Equations 


Solutions to the Euler equations [1]-[3] are generically 
discontinuous functions obeying the Clausius-Duhem 
inequality, the second law of thermodynamics: 


à (oS) + Vx - (mS) > 0 [9] 


in the sense of distributions. Such discontinuous 
solutions are called entropy solutions. 

When a flow is isentropic, that is, entropy S is a 
uniform constant So in the flow, then the Euler 
equations for the flow take the simpler form: 


Op + Vy: m= 0 


[10] 
Qum + V. - (mc m/p) -- Vxp =90 


where the pressure is a function of the density, 
p — p(p, So), with constant So. For a polytropic gas, 


pe) =k, y>1 [11] 


where « can be any positive constant by scaling. This 
system can be derived from [1] to [3] as follows: for 
smooth solutions of [1]-[3], entropy S(p,m, E) is 
conserved along fluid particle trajectories, that is, 


&,(pS) + Vx - (mS) = 0 


If the entropy is initially a uniform constant and 
the solution remains smooth, then the energy 
equation can be eliminated and entropy S keeps the 
same constant in later time. Thus, under constant 
initial entropy, a smooth solution of [1]-[3] satisfies 
the equations in [10]. Furthermore, solutions of 
system [10] are also a good approximation to 
solutions of system [1]-[3] even after shocks form, 
since the entropy increases across a shock to the 
third order in wave strength for solutions of [1]-[3], 
while in [10] the entropy is constant. Moreover, 
system [10] is an excellent model for the isothermal 
fluid flow with y= 1 and for the shallow-water flow 
with ^; — 2. For such barotropic flows (1.e., p = p(p)), 
the energy equation [3] serves as an entropy 
inequality (see Lax (1973)): 


OE + Vx - (m(E + p(p))/p) € 0 
in the sense of distributions 


In the one-dimensional case, system [1]-[3] in 
Eulerian coordinates is 


Op + O,m = 0, ðm + à, (m* / p 4- p) =0 


BE + O_(m(E +p)/p) =0 pal 


The system above can be rewritten in Lagrangian 
coordinates: 


OT — Ov = 0, Qv + O,p = 0 


[13] 
Ole + v^/2) + 0,(pv) = 0 


with v=m/p, where the coordinates (t,x) are 
the Lagrangian coordinates, which are different 
from the Eulerian coordinates for [12]; for simp- 
licity of notations, we do not distinguish them. 
For the barotropic case, systems [12] and [13] 
reduce to 


Op + Om = 0, Om + Ox(m7/p+p)=0 [14] 


and 


OT = Ov 一 0, OV T Op 二 性 [15] 


respectively, where pressure p = p(p) = p(T), 7 —1/p. 
The solutions of [12] and [13], as well as [14] and 
[15], are equivalent even for entropy solutions with 
vacuum where p — 0. 

The potential flow is well known in transonic 
aerodynamics, beyond the isentropic approxi- 
mation [10] from [1] to [3]. Denote D;=0,+ 
T v,Ox, the convective derivative along fluid 
particle trajectories. From [1] to [3], we have 


D,S =0 [16] 


and, by taking the curl of the momentum equations, 


D, (=) = ig PMS) yg x V.S [17] 
p] p p 
The identities [16] and [17] imply that a smooth 
solution of [1]-[3] which is both isentropic and 
irrotational at time t=0 remains isentropic and 
irrotational for all later times, as long as this 
solution stays smooth. Then, the conditions 
S= Sọ = const. and w=curl,v=0 are reasonable for 
smooth solutions. For a smooth irrotational solu- 
tion, we integrate the d-momentum equations in 
[10] through Bernoulli's law: 


Ov + V«(|v|^/2) + V4b(p) = 0 


where h'(p)=p,(p,So)/p. On a simply connected 
space region, the condition curl, v — 0 implies that 


there exists ® such that v= V,®. Then, 
Op + Vx : (pV x) — 0 18] 
QB + jV, + b(p) = K 


for some constant K. From the second equation in 
[18], we have 


p(D®) = b (K — (849 + 1I V. [^)) 


Then, system [18] can be rewritten as the following 
time-dependent potential flow equation of second 
order: 


O,p(D9) + Vx: (p(D$)V,9) = 0 [19] 


For a steady solution ®= (x), that is, 9, — 0, 
we obtain the celebrated steady potential flow 
equation of aerodynamics: 


V. (p(V,O)V,O) = 0 [20] 


In applications in aerodynamics, [18] or [19] is 
used for discontinuous solutions, and the empirical 
evidence is that entropy solutions of [18] or [19] are 
fairly good approximations to entropy solutions for 
[1]-[3] provided that (1) the shock strengths are 
small, (2) the curvature of shock fronts is not too 
large, and (3) there is a small amount of vorticity in 
the region of interest. Model [19] or [18] is an 
excellent model to capture multidimensional shock 
waves by ignoring vorticity waves, while the 
incompressible Euler equations are an excellent 
model to capture multidimensional vorticity waves 
by ignoring shock waves. 


Local Well-Posedness for Classical Solutions 


Consider the Cauchy problem for the Euler equations 
[1]-[3] with Cauchy data [8]: 


Assume that uo : Rf — D is in HS N L® with s > d/2 +1. 
Then, for the Cauchy problem [1]-[3] and [8], there 
exists a finite time T = TY(|[uo||,, ||uo||j.«) € (0, oc) such 
that there is a unique, stable bounded classical solution 
u € C'([0, T] x R^) with u(t, x) € D for (t,x) € [0, T] x 
R? and u € C([0, T]; H) n C ([0, T]; H*'). Moreover, 
the interval [0, T) with T < oo is the maximal interval 
of the classical Hs existence for [1]-[3] if and only if 
either ||(#,Vx)||;~. — oo or u(t,x) escapes every 
compact subset K € D as t — T. 


This local existence can be established by relying 
solely on the elementary linear existence theory for 
symmetric hyperbolic systems with smooth coeffi- 
cients (cf. Majda (1984), or by the abstract 
semigroup theory (Kato 1975). 


Formation of Singularities 


For the one-dimensional case, singularities include 
the development of shock waves and formation of 
vacuum states. For the multidimensional case, the 
situation is much more complicated: besides shock 
waves and vacuum states, singularities can also be 
generated from vortex sheets, focusing and breaking 
of waves, among others. 
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Consider the Cauchy problem of the Euler 
equations [1]-[3] in R? for polytropic gases with 
smooth initial data: 


(p, v,S)|,-9 = (Po, vo, So)(x) 
po(x) > 0, x € R? [21] 


satisfying (po, vo, So)(x) =(p,0, S) for |x| > L, where 
万 > 0, S, and L are given constants. The equations 
possess a unique local C! solution (p, v, S)(£, x) with 
p(t, x) > 0 provided that the initial data [21] is 
sufficiently regular. The support of the smooth 
disturbance (po(x) — p,vo(x), So(x) — S) propagates 
with speed at most c = 4/p,(p, S) (the sound speed), 


that is, 
(p,v,S)(t,x) = (p,0,S) if |x| > L--ot [22] 
Define 


P(e) = | (ptp(t.x). St.) — pp. ^) dx 


(t — | (pot x) -xda 


Which, roughly speaking, measure the entropy and the 
radial component of momentum. Then, if (p, v, S)(t, x) 
is a C! solution of [1]-[3] and [21] for 0 < t < T, and 


P(0)20, F(0)» acR* max po(x) 


with o — 167/3 [23] 


then the lifespan T of the C! solution is finite 
(Sideris 1985). 

To illustrate a way in which the conditions in 
[23] may be satisfied, consider the initial data: 


po =p, So = S. Then P(0) — 0, and [23] holds if 
f vo(x) - x dx > acR* 
Ix| « R 


Comparing both sides, one finds that the initial 
velocity must be supersonic in some region relative 
to the sound speed at infinity. The formation of a 
singularity (presumably a shock wave) is detected as 
the disturbance overtakes the wave front forcing the 
front to propagate with supersonic speed. 
Singularities are formed even without the condi- 
tion of largeness, such as [23], being satisfied. For 
example, if So(x) > S and, for some 0 < Ro < R, 


| Ma — n (pol EN, 
Ix|»r [24] 
j _ KIPP- P)po(ao0(a) xde > 0 


for Ro<r< R, then the lifespan T of the C! 
solution of [1]-[3] and [21] is finite. The 
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assumptions in [24] mean that, in an average sense, 
the gas must be slightly compressed and outgoing 
directly behind the wave front. 


Local Well-Posedness for Shock-Front Solutions 


For a general hyperbolic system of conservation laws 
[4], shock-front solutions are discontinuous, piecewise 
smooth entropy solutions with the following structure: 


1. There exists a C? spacetime hypersurface S(t) 
defined in (t,x) for 0 €£ € T with spacetime 
normal (vi, v.) — (wu, v1,..., vj) as well as two 
C! vector-valued functions: u+(t,x) and ww (t, x), 
defined on respective domains D* and D` on 
either side of the hypersurface S(t) and satisfying 
wt + V, -f(u*)-0 in D*; 

2. The jump across the hypersurface S(t) satisfies the 
Rankine-Hugoniot condition: 


(w^ — ww) + vy (FQ) — f(w )))s 0 


For [4], the surface S is not known in advance 
and must be determined as part of the solution of 
the problem; thus, the two equations in (1)-(2) 
describe a multidimensional, highly nonlinear, free- 
boundary-value problem. The initial data yielding 
shock-front solutions is defined as follows. Let So be 
a smooth hypersurface parametrized by o, and let 
v(o) =(4,..., vg)(a) be a unit normal to So. Define 
the piecewise smooth initial values for respective 
domains Dj and D; on either side of the hypersur- 
face So as 


daft bates x € i25) 


It is assumed that the initial jump in [25] satisfies the 
Rankine-Hugoniot condition, that is, there is a 
smooth scalar function claw) so that 


- e(o) (uj (o) — w; (a) 
+v(a): (f(ui(2)) -f(u5(0)) -0 26 


and that o(o) does not define a characteristic 
direction, that is, 


o(a) y ^i (ug ). 


where Aj, i= 1,...,7, are the eigenvalues of [4]. It is 
natural to require that S(0) = So. 

Consider the Euler equations [1]-[3] in R? for 
polytropic gases with piecewise smooth initial data: 


(9 vo. E*) (x), 
(pp, vg, E (æ), 


Assume that So is a smooth compact surface in R? 
and that (pj , vi, Ej )(x) belongs to the uniform local 


a€Sg 1€ixn [27 


x € Di 


2 
x ED Die 


(p, v, E)|,9 = l 


Sobolev space Hs (Dj ), while (pg , vp , Eg )(x) belongs 
to the Sobolev space H*(Dj), for some fixed s > 10. 
Assume also that there is a function cla) € H*(So) 
so that [26] and [27] hold, and the compatibility 
conditions up to order s — 1 are satisfied on So by 
the initial data, together with the entropy condition: 


Ug : v(a) 1 Pal Pe +3) < o(a) 


< v : v(a) + y/Pp(P9 So) [29] 


Then, there are a C? hypersurface S(t) and C! 
functions (p*,v*,E*)(t,x) defined for t € [0, T], 
with T sufficiently small, so that 


(9,9 ,E")(t,x), 
(p ,v ,E (tx), 


is the discontinuous shock-front solution of the 
Cauchy problem [1]-[3] and [28]. Here a vector 
function u is in H*,, provided that there exists 
some r>0 so that max,pallwryullys «oo with 
wr, ,(x)—1w((x—y)/r), where we C; (R^) is aá 
function so that w(x) >0,w(x)=1 when |x| € 1/2, 
and w(x)=0 when |x| > 1. 

The compatibility conditions are needed in order 
to avoid the formation of discontinuities in higher 
derivatives along other characteristic surfaces ema- 
nating from Sg: Once the main condition [26] is 
satisfied, the compatibility conditions are automati- 
cally guaranteed for a wide class of initial data. The 
idea of the proof is to use the existence of a strictly 
convex entropy and the symmetrization of [4]; the 
shock-front solutions are defined as the limit of a 
convergent classical iteration scheme based on 
a linearization by using the theory of linearized 
stability for shock fronts (Majda 1984). The uni- 
form existence time of shock-front solutions in 
shock strength can be achieved (Métivier 1990). 


(t,x) € D* 


(t,x) e D^ [30] 


(p,9, EX (El | 


Global Theory in L” for the Isentropic Euler 
Equations for x € R 


Consider the Cauchy problem for [14] with initial 
data: 


(p, m)|,.o = (po, mo)(x) [31] 


where po and mo are in the physical region 
((p, m): p 2 0,|m| € Cop} for some Co > 0. System 
[14] is strictly hyperbolic at the states with p > 0, 
and strict hyperbolicity fails at the vacuum states 
V :={(p,m/p): p — 0, |m/p| < co}. Then, we have: 


1. There exists a global solution (p, 77)(t, x) of the 
Cauchy problem [14] and [31] satisfying 


O<plt,x)<C,  |m(nx) €Cpo(tx) X132] 


for some C > 0 depending only on Co and ^, and 
the entropy inequality 


in the sense of distributions for any convex weak 
entropy-entropy flux pair (1), 4), that is, 


Vq(p, m) = Vn(p,m)Vf (p,m) 
with 
V^n(p,m) >0 and wly=0 


2. The solution operator (p, "m)(t, - ) — S, (po, mo)( ), 
determined by (1), is compact in L} (R) for t > 0; 

3. Furthermore, if (po, mo)(x) is periodic with period 
P, then there exists a global periodic solution 
(p, m)(t, x) with [32] such that (p, »:)(t, x) asymp- 
totically decays to 


1 
p J, o m) odi 
in L!, 


The convergence of the Lax-Friedrichs scheme, 
the Godunov scheme, and the vanishing viscosity 
method for system [14] have also been established. 

The results are based on a compensated compact- 
ness framework to replace the BV compactness 
framework. For a gas obeying the y-law, the case 
y=(N+2)/N,N>5 odd, was first studied by 
DiPerna (1983), and the case 1«7 5/3 for 
usual gases was first solved by Chen (1986) and 
Ding-Chen-Luo (1985). The cases y > 3 and 5/3 < 
y <3 were treated by Lions-Perthame-Tadmor 
(1994) and  Lions-Perthame-Souganidis (1996), 
respectively. The case of general pressure laws was 
solved by Chen-LeFloch (2000, 2003). All the 
results for entropy solutions to [14] in Eulerian 
coordinates can equivalently be presented as the 
corresponding results for entropy solutions to [15] 
in Lagrangian coordinates. The isothermal case 
y= 1 was treated by Huang—Wang (2002). 


Global Theory in BV for the Adiabatic Euler 
Equations for xc R 


Consider the Euler equations [13] for polytropic 
gases with the Cauchy data: 


(T, v, $)|,.g = (To vo; So)(x) [34] 


Then we have (Liu 1977, Temple 1981, Chen and 
Wagner 2003): 


Let K C {(7, v, S): 7 > 0] be a compact set in R+ x R^ 
and let N > 1 be any constant. Then there exists a 
constant Co = Co(K, N), independent of y € (1, 5/3], 
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such that, for every initial data (To, Vo, So) € K with 
T Vg(To, Uo, So) a N, when 


(y — 1)TVn(To,vo,So) € Co for any y € (1, 5/3] 


the Cauchy problem [13] and [34] has a global 
entropy solution (r,v,S)(t,x) which is bounded and 
satisfies 


TVn(r, U, S)(t, ) < C TVn(70, V0, So) 
for some constant C > 0 independent of +. 


This result specially includes that for the baro- 
tropic case (Nishida 1968, Nishida-Smoller 1973, 
DiPerna 1973). Some efforts in the direction of 
relaxing the requirement of small total variation 
have been made. Some extensions to the initial- 
boundary value problems have also been made. In 
addition, an entropy solution in BV with periodic 
data or compact support decays when t— 0. 
Furthermore, even for a general hyperbolic system 
[4] for x € R, we have: 


If the initial data functions so(x) and vo(x) have 
sufficiently small total variation and uo — vo € L'(R), 
then, for the corresponding exact Glimm, or wave- 
front tracking, or vanishing viscosity solutions u(t, x) 
and v(t,x) of the Cauchy problem [4] and [8], there 
exists a constant C > 0 such that 


u(t, -) — v(t, Shiv < Clluo — Voller) 
for all t > 0 [35] 


An immediate consequence is that the whole 
sequence of the approximate solutions constructed 
by the Glimm (1965) scheme, as well as the wave- 
front tracking method and the vanishing viscosity 
method, converges to a unique entropy solution of 
[4] and [8] when the mesh size or the viscosity 
coefficient tends to zero. More detailed discussions 
and extensive references about the L'-stability of BV 
entropy solutions and related topics can be found in 
Bressan (2000) and Dafermos (2000); also see Chen 
and Wang (2002). Furthermore, the Riemann solu- 
tion is unique and asymptotically stable in the class 
of entropy solutions to [13] with large variation 
satisfying only one physical entropy inequality 
(Chen-Frid-Li 2002). 


Multidimensional Steady Theory 


The mathematical study of two-dimensional steady 
supersonic flows past wedges, whose vertex angles 
are less than the critical angle, can date back to the 
1940s, since the stability of such flows is fundamental 
in applications (cf. Courant-Friedrichs (1948)). Local 
solutions around the wedge vertex were first 
constructed (Gu 1962, Schaeffer 1976, Li 1980). 
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Such global potential solutions were constructed 
when the wedge has some convexity, or is a small 
perturbation of the straight wedge with fast decay in 
the flow direction (Chen 2001, Chen-Xin-Yin 2002), 
or is piecewise smooth which is a small perturba- 
tion of straight wedge (Zhang 2003). For the 
two-dimensional steady supersonic flows gov- 
erned by the full Euler equations past Lipschitz 
wedges, it indicates (Chen-Zhang-Zhu 20052) 
that, when the wedge vertex angle is less than 
the critical angle, the strong shock front 
emanating from the wedge vertex is nonlinearly 
stable in structure globally, although there may be 
many weak shocks and vortex sheets between the 
wedge boundary and the strong shock front, under 
the BV perturbation of the wedge so that the total 
variation of the tangent function along the wedge 
boundary is suitably small. This asserts that any 
supersonic shock for the wedge problem is non- 
linearly stable. 

A self-similar gas flow past an infinite cone in R? 
with small vertex angle is also nonlinearly stable 
upon the BV perturbation of the obstacle (Lien-Liu 
1999). It is still open for the nonlinear stability when 
the infinite cone in R? has arbitrary vertex angle. 
The stability issues of supersonic vertex sheets have 
been studied by classical linearized stability analysis, 
large-scale numerical simulations, and asymptotic 
analysis. In particular, the nonlinear development of 
instabilities of supersonic vortex sheets at high 
Mach number was predicted as time evolves 
(Woodward 1985, Artola-Majda 1989). In contrast 
with the prediction of evolution instability, steady 
supersonic vortex sheets, as time-asymptotics, are 
stable globally in structure, even under the BV 
perturbation of the Lipschitz walls, although there 
may be many weak shocks and supersonic vortex 
sheets away from the strong vortex sheet (Chen- 
Zhang-Zhu 2005b). 

Transonic shock problems for steady fluid flows 
are important in applications (cf. Courant and 
Friedrichs (1948)). A program on the existence and 
stability of multidimensional transonic shocks has 
been initiated and three new analytical approaches 
have been developed (Chen-Feldman 2003, 2004). 
The transonic problems include the existence and 
stability of transonic shocks in the whole R^, the 
existence and stability of transonic flows past finite 
or infinite nozzles, the stability of transonic flows 
past infinite nonsmooth wedges, and the existence of 
regular shock reflection solutions. The first 
approach is an iteration scheme based on the 
nondegeneracy of the free boundary condition: the 
jump of the normal derivative of a solution across 


the free boundary has a strictly positive lower bound 
(Chen-Feldman 2003, 2004), which works for the 
nonlinear equations whose coefficients may depend 
on not only the solution itself but also the gradients 
of the solution. The second approach is a partial 
hodograph procedure, with which the existence and 
stability of multidimensional transonic shocks that 
are not nearly orthogonal to the flow direction can 
be handled (Chen-Feldman 2004): one of the main 
ingredients in this approach is to employ a partial 
hodograph transform to reduce the free boundary 
problem into a conormal boundary value problem 
for the corresponding nonlinear equations of diver- 
gence form and then develop techniques to solve the 
conormal boundary value problem. When the reg- 
ularity of the steady perturbation is C^^ or higher, 
the third approach is to employ the implicit function 
theorem to deal with the existence and stability 
problem. Another iteration approach, which works 
well for the two-dimensional equations whose coeffi- 
cients depend only on the solution itself, has also 
been developed (Canic-Keyfitz-Lieberman 2000). 

Further longstanding open problems include the 
existence of global transonic flows past an airfoil or 
a smooth obstacle (Morawetz 1956—58, 1985). 


Multidimensional Unsteady Problems 


Now we present some multidimensional time- 
dependent problems with a simplifying feature that 
the data (domain and/or the initial data) coupled 
with the structure of the underlying equations 
obey certain geometric structure so that the multi- 
dimensional problems can be reduced to lower- 
dimensional problems with more complicated 
couplings. Different types of geometric structure 
call for different techniques. 

The Euler equations for compressible fluids 
with geometric structure describe, many important 
fluid flows, including spherically symmetric flows 
and self-similar flows. Such geometric flows 
are motivated by many physical problems such as 
shock diffractions, supernovas formation in stellar 
dynamics, inertial confinement fusion, and under- 
water explosions. For the initial data with large 
amplitude having geometric structure, the requi- 
red physical insight is: (1) whether the solution 
has the same geometric structure globally and 
(2) whether the solution blows up to infinity in a 
finite time. These questions are not easily under- 
stood in physical experiments and numerical simula- 
tions, especially for the blow-up, because of the 
limited capacity of available instruments and 
computers. 


The first type of geometric structure is spherical 
symmetry. A criterion for L^ Cauchy data functions 
of arbitrarily large amplitude was observed to 
guarantee the existence of spherically symmetric 
solutions in L™ in the large for the isentropic flows, 
which model outgoing blast waves and large-time 
asymptotic solutions (Chen 1997). On the other hand, 
it is evident that the density blows up as |x| — O0 in 
general, especially for the focusing case; the singular- 
ity at the origin makes the problem truly multi- 
dimensional due to the reflection of waves from 
infinity and their strengthening as they move radially 
inwards. One of the important open questions is to 
understand the order of singularity, p(t, |x|) ~ |x|", 
at the origin for bounded Cauchy data. 

The second type of geometric structure is self- 
similarity, that is, the solutions with initial data 
functions that give rise to self-similar solutions, 
especially including Riemann solutions. Compressi- 
ble flow equations in Rt, d > 2, with one or more 
linearly degenerate modes of wave propagation have 
additional difficulties. In that case, the global flow is 
governed by a reduced (self-similar) system which is 
of composite (hyperbolic-elliptic) type in the sub- 
sonic region. The linearly degenerate waves give rise 
to one or more families of degenerate characteristics 
which remain real in the subsonic region. In some 
cases, the reduced equations couple an elliptic 
(degenerate elliptic) problem for the density with a 
hyperbolic (transport) equation for the vorticity. 

An important prototype for both practical 
applications and the theory of multidimensional 
complex wave patterns is the problem of diffraction 
of a shock wave which is incident along an inclined 
ramp (see Glimm and Majda (1991)) When a 
plane shock hits a wedge head-on, a self-similar 
reflected shock moves outward as the original 
shock moves forward. The computational and 
asymptotic analysis shows that various patterns of 
reflected shocks may occur, including regular 
reflection and (simple, double, and complex) 
Mach reflections. The main part or whole reflected 
shock is a transonic shock in the self-similar 
coordinates, for which the corresponding equation 
changes the type from hyperbolic to elliptic across 
the shock. There are few rigorous mathematical 
results on the global existence and stability of 
shock reflection solutions and the transition among 
regular, simple Mach, double Mach, and complex 
Mach reflections for the potential flow equa- 
tion [19] and the full Euler equations [1]-[3]. 
Some results were recently obtained for simplified 
models including the transonic small-disturbance 
equation near the reflection point and the pressure 
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gradient equation when the wedge is close to a flat 
wall. 

For the potential flow equation [19], a self- 
similar solution is a solution of the form: 
V—tó(y) y—x/t. Letting  o(y)— —y^/2 + oly), 
then the system can be rewritten in the form of a 
second-order equation of mixed hyperbolic-elliptic 
type in y € R^ by scaling: 


Vy - (e(IVyol ^.) Vyp) + de(IVsel^ e) - 0 [36] 


with p(q2,z) — (1 — (q? + 22)/2)!/^-V, Equation [36] 
at |Vyy|=q is hyperbolic (pseudosupersonic) if 
p(q ,2) + qpa(q^,z) < 0 and elliptic (pseudosubsonic) 
if p(q*,z) + qpa(q^,z) > 0. Under this framework, 
the nature of the shock reflection pattern has been 
explored for weak incident shocks (strength 5) and 
small wedge angles 26, by a number of different 
scalings, a study of mixed equations, and matching 
asymptotics for the different scalings, where the 
parameter —c,02 /b(^-- 1) ranges from 0 to oo 
and c, is the speed of sound behind the incident 
shock (Morawetz 1994). For 5972, a regular 
reflection of both strong and weak kinds is 
possible as well as a Mach reflection; for 8 < 
1/2, a Mach reflection occurs and the flow behind 
the reflection is subsonic and can be constructed in 
principle (with an elliptic problem) and matched; 
and for 1/2« 8 «€ 2, the flow behind a Mach 
reflection may be transonic which is a solution of 
a nonlinear boundary-value problem of mixed 
type. The basic pattern of reflection has been 
shown to be an almost semicircular shock issuing, 
for a regular reflection, from the reflection point 
on the wedge and, for a Mach reflection, matched 
with a local interaction flow. Some related 
observations were also made (Keller-Blank 1951, 
Hunter-Keller 1984, Hunter 1988). It is important 
to establish rigorous proofs. Recently, a rigorous 
existence proof was established for global solutions 
to shock reflection by large-angle wedges in Chen 
and Feldman (2005). 


Analytical Frameworks for Entropy Solutions 


The recent great progress for entropy solutions for 
one-dimensional time-dependent Euler equations 
and two-dimensional steady Euler equations, based 
on BV, L', or even L™ estimates, naturally arises the 
expectation that a similar approach may also be 
effective for the multidimensional Euler equations, 
or more generally, hyperbolic systems of conserva- 
tion laws, especially, 


u(t, [lav < Clluollgy [37] 
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Unfortunately, this is not the case. The necessary 
condition for [37] to be held for p 2 (Rauch 
1986) is 


Vf ,(u)Vf,(u) = Vfi(u)Vf,(u) 
for all &,1 = 1,2,....,2 [38] 


The analysis suggests that only systems in which the 
commutativity relation [38] holds offer any hope for 
treatment in the framework of BV. This special case 
includes the scalar case 7 — 1 and the case of one 
space dimension d — 1. Beyond that, it contains very 
few systems of physical interest. 

In this regard, it is important to identify effective 
analytical frameworks for studying entropy solu- 
tions of the multidimensional Euler equations [1]- 
[3], which are not in BV. Naturally, we want to 
approach the questions of existence, stability, 
uniqueness, and long-time behavior of entropy 
solutions with as much generality as possible. For 
this purpose, a theory of divergence-measure fields 
to construct such a global framework has been 
developed for studying entropy solutions (Chen-Frid 
1999, 2000, Chen-Torres 2005, Chen-Torres-Ziemer 
2005). For more details, see Chen (2005). 


Viscous Compressible Fluid Flows: 
Navier-Stokes Equations 


Compressible fluid flows that are viscous and 
conduct heat are governed by the following 


Navier-Stokes equations: 
3p +Vx:m=0, xcR [39] 


m&m 


àm e V. ( )rve- v. [40] 


OE+ Vx (“(E+)) = Vx: (=z) —Vx-q [41] 


Here, L=ZX(V xv, p,9) is the viscous stress tensor 
which is symmetric from the conservation of angular 
momentum and q is the heat flux. If the fluid is 
isotropic and the viscous tensor X is a linear function 
of V,v and invariant under a change of reference 
frame (translation and rotation), then we deduce 
from elementary algebraic manipulations that 
necessarily 


X = X(p,0)Vx - v + 2p(p,0)D [42] 


which corresponds to the Newtonian fluids, where 
D — (Vv 4- (Vxv)')/2 is the deformation tensor and 
A and pu are the Lamé viscosity coefficients. 


Furthermore, since the fluid is isotropic, we are led 
to the Fourier law: 


q = —k(p, 0. |Vx9|) V0 


for scalar function k which, in most cases, is taken 
to be simply a function of p and 0, or even a 
constant called the thermal conduction coefficient. 
Again, system [39]-[41] is closed by the constitutive 
relations in [5]. The equation for entropy S is 


à (o) + V: (mS + 2) 


E(Vav): Viv q-Vx0 
i [43] 


The second law of thermodynamics indicates that 
the right-hand side of [43] should be non-negative 
which yields the restriction: 
k(p,0,|Vx0|) 20, >20, A-c2u/d20 

The case jj 7 0 and À+ 4 >Q is the viscous case 
with heat conductivity k > 0. In particular, the 
kinetic theory indicates that the Stokes relationship 
should hold, namely A= —2pu/d and the adiabatic 
component y= 5/3 for monatomic gases. 

In mathematical viscous fluid dynamics, an 
important model is the barotropic model for 
viscous fluids, that is, p=p(p). Then, the specific 
energy E can be taken in the form of 
E — (1/2)p|v|* + pe(p) with e'(p) — p(o)/p?. For clas- 
sical solutions, the energy of a barotropic flow 
satisfies the equality: 


QE 4- V. - ((E-- p)v) = Vx- (Ev) - E: Vv 


which is now a direct consequence of [39] and [40]. 

The question of local existence of classical 
solutions to [39]-[41] for regular initial data was 
addressed by Nash (1962), where there is no 
indication whether or not these solutions exist for 
all times. 

In the case of one space dimension, the well- 
posedness is largely settled. The basic result for the 
existence of classical solutions is that of Kazhikhov 
(1976); see Lions (1998) and Feireisl (2004) for 
extensive references. The discontinuous solutions 
have been constructed (Shelukhin 1979, Serre 1986, 
Hoff 1987, Chen-Hoff-Trivisa 2000). 

For the Navier-Stokes equations in R? with 
general equation of state, the global classical 
solutions for the Cauchy problem and various 
initial-boundary value problems whose initial data 
is small around a constant state have been 


constructed (Matsumura-Nishida 1980, 1983). The 
approach is to obtain a priori estimates via energy 
methods for extending the local solution or for a 
difference method globally. These results have been 
extended to the Cauchy problem or the initial- 
boundary value problems with small discontinuous 
initial data (Hoff 1997). 

For the Navier-Stokes equations in R^ for 
barotropic flows with [11] and large initial data, 
the global existence of solutions containing vacuum 
for the Cauchy problem or various initial-boundary 
value problems was first established by Lions 
(1998) for y 2 3/2 if d=2, 7>9/5 if d=3, and 
y > d/2 if d » 4. The gap was closed by Feireisl- 
Novotny-Petzeltová (2001) for the full range 
y >d/2. These results have been extended to the 
full Navier-Stokes equations describing the motion 
of a general compressible, viscous, and heat con- 
ducting fluid (see Feireisl (2004)). The physically 
relevant isothermal case, y=1, is completely open 
even if d=2. The only large data existence result is 
that for radially symmetric data (Hoff 1992). The 
general case y > 1 and d=3 for radially symmetric 
data was solved only recently (Jiang-Zhang 2001). 

The lower-bound estimate on the density is a 
delicate issue. Weak solutions containing vacuum 
for the isentropic viscous flows with constant 
viscosity are unstable in general (Hoff-Serre 
1991). Hence, it is important to see whether 
vacuum will never develop if the initial data is 
away from vacuum; this has been shown for the 
one-dimensional case for large initial data and 
for the multidimensional case with small data. On 
the other hand, from the kinetic theory, if 
solutions contain vacuum, then the viscosity 
coefficients in the Navier-Stokes equations should 
depend on the density near vacuum; this indeed 
stabilizes the solutions for the one-dimensional 
case. 

The stability of viscous shock waves has been 
studied for the one-dimensional case (see Liu (2000) 
and the references therein). The compressible- 
incompressible limits from the isentropic compres- 
sible to incompressible Navier-Stokes equations 
when the Mach number tends to zero have been 
established for arbitrarily weak solutions (Lions- 
Masmoudi 1998) and for smooth solutions and a 
class of initial data functions (Hoff 1998). 
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The inviscid limits from the Navier-Stokes equa- 
tions to the Euler equations have been established as 
long as the solutions of the Euler equations are 
smooth, when the viscosity and heat conductivity 
coefficients tend to zero (Klainerman-Majda 1982). 
It is completely open for general entropy solutions, 
even in the one-dimensional case. 


See also: Breaking Water Waves; Capillary Surfaces; 
Fluid Mechanics: Numerical Methods; Geophysical 
Dynamics; Incompressible Euler Equations: 
Mathematical Theory; Inviscid Flows; 
Magnetohydrodynamics; Newtonian Fluids and 
Thermohydraulics; Non-Newtonian Fluids; Partial 
Differential Equations: Some Examples; Stability of 
Flows; Viscous Incompressible Fluids: Mathematical 
Theory. 
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Conventions and Units 


This article adopts many of the conventions and 
notations of Misner, Thorne, and Wheeler (1973) - 
hereafter denoted MTW - including metric signature 
( 一 十 十 十 ); definitions of Christoffel symbols and 
curvature tensors (up to index permutations Per- 
mitted by standard symmetries of the tensors in a 
coordinate basis); the use of Greek indices 
a, 3,7,-.., ranging over the spacetime coordinate 
values (0, 1,2, 3) — (t, x!, x?, x?), to denote the com- 
ponents of spacetime tensors such as g,,; the similar 
use of Latin indices i,j,k,..., ranging over the 
spatial coordinate values (1,2,3) — (x!, x?,x?), for 
spatial tensors such as ?ii the use of the Einstein 
summation convention for both types of indices; the 
use of standard Kronecker delta symbols (tensors), 
ô”, and 6';; the choice of geometric units, G =c = 1; 
and, finally, the normalization of the matter fields 
implicit in the choice of the constant 87 in [1]. 

The majority of the equations that appear in this 
article are tensor equations, or specific components 
of tensor equations, written in traditional index (not 
abstract index) form. Thus, these equations are 
generally valid in any coordinate system, (t,x*), 
but, of course do require the introduction of a 
coordinate basis and its dual. This approach is also 
largely a matter of convention, since all of what 
follows can be derived in a variety of fashions, some 
of them purely geometrical, and there are also 
approaches to numerical relativity based, for exam- 
ple, on frames rather than coordinate bases. 

This article departs from MTW in its use of a, 5, 
and + to denote the lapse, shift, and spatial metric, 
respectively, rather than MTW's N, N', and ?)g;.. 

Finally, the operations of partial differentiation 
with respect to coordinates x^, t, and x’ are denoted 
0,, Or and Ó;, respectively. : 


Introduction 


The numerical analysis of general relativity, or 
numerical relativity, is concerned with the use of 
computational methods to derive approximate solu- 
tions to the Einstein field equations 


Gy = 81T,, (1) 


Here, G,, is the Einstein tensor — that contracted 
piece of the Riemann curvature tensor that has 
vanishing divergence — and T, is the stress tensor of 
the matter content of the spacetime. T,,, likewise has 
vanishing divergence, an expression of the principle 
of local conservation of stress-energy that general 
relativity embodies. 

The elegant tensor formulation [1] belies the fact 
that, ultimately, the field equations are generically a 
complicated and nonlinear set of partial differential 
equations (PDEs) for the components of the space- 
time metric tensor, g,,(x?), in some coordinate 
system x^. Moreover, implicit in a numerical 
solution of [1] is the numerical solution of the 
equations of motion for any matter fields that 
couple to the gravitational field — that is, that 
contribute to T,,,. The reader is reminded that it is a 
hallmark of general relativity that, in principle, all 
matter fields — including massless ones such as the 
electromagnetic field — contribute to Tv. 

Now, in the 3 十 1 approach to general relativity 
that is described below, the task of solving the field 
equations [1] is formulated as an initial-value or 
Cauchy problem. Specifically, the spacetime metric, 
Eu (x^) = gl, xt), which encodes all geometric 
information concerning the spacetime, M, is 
viewed as the time history, or dynamical evolution, 
of the spatial metric, 7;(0, x^), of an initial space- 
like hypersurface, X(0). In any practical calculation, 
the degree to which the matter fields “back-react” 
on the gravitational field, that is, contribute to T,» 
substantially enough to cause perturbations in gw 
at or above the desired accuracy threshold, will 
thus depend on the specifics of the initial 
configuration. 

In astrophysics, there are relatively few well- 
identified environments in which it is generally 
thought to be crucial to the faithful emulation of 
the physics that the matter fields be fully coupled to 
the gravitational field. However, both observation- 
ally and theoretically, the existence of gravitation- 
ally compact objects is quite clear. Gravitationally 
compact means that a star with mass, M, has a 


radius, R, comparable to its Schwarzschild radius, 
Ry, which is defined by 


RM = M x 1077 kgm’ [2] 


Here, and only here, G and c — Newton’s gravita- 
tional constant and the speed of light, respectively — 
have been explicitly reintroduced. The fact that 
Ry/R is about 1075 and 10? at the surfaces of the 


sun and earth, respectively, is a reminder of just how 


— 
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weak gravity is in the locality of Earth. However, as 
befits anything of Einsteinian nature, the weakness 
of gravity is relative, so that at the surface of a 
neutron star, one would find 


Rm 
while for black holes, one has 
Rm _ 
? 1 [4] 


In such circumstances, gravity is anything but 
weak! Furthermore, in situations where the mat- 
ter-energy distribution has a highly time-dependent 
quadrupole moment - such as occurs naturally with 
a compact-binary system (i.e. a gravitationally 
bound two-body system, in which each of the 
bodies is either a black hole or a neutron star) — the 
dynamics of the gravitational field, including, 
crucially, the dynamics of the radiative components 
of the gravitational field, can be expected to 
dominate the dynamics of the overall system, 
matter included. For scenarios such as these, it 
should come as no surprise that the solution of the 
combined gravitohydrodynamical system begs for 
numerical analysis. 

In addition, both from the physical and mathe- 
matical perspectives, it is also natural to study the 
strong, field dynamic regimes (R — Ry and/or v — c, 
where v is the typical speed characterizing internal 
bulk motion of the matter) of general relativity 
within the context of a variety of matter models. 
Typical processes addressed by these theoretical 
studies include the process of black hole formation, 
end-of-life events for various types of model stars, 
and, again, the interaction, including collisions, of 
gravitationally compact objects. Note that it is 
another hallmark of general relativity that highly 
dynamical spacetimes need not contain any matter; 
indeed, the interaction of two black holes — the 
natural analog of the Kepler problem in relativity — 
is a vacuum problem; that is, it is described by a 
solution of [1] with T, — 0. 

Motivated in significant part by the large-scale 
efforts currently underway to directly detect gravita- 
tional radiation (gravitational waves), much of the 
contemporary work in numerical relativity is 
focused on precisely the problem of the late phases 
of compact-binary inspiral and merger. Such bin- 
aries are expected to be the most likely candidates 
for early detection by existing instruments such as 
TAMA, GEO, VIRGO, LIGO, and, more likely, by 
planned detectors including LIGO II and LISA (see, 
e.g., Hough and Rowan (2000)) Detailed and 
accurate predictions of expected waveforms from 


these events — using the techniques of numerical 
relativity — have the potential to substantially hasten 
the discovery process, on the basis of the general 
principle that if one knows what signal to look for, 
it is much easier to extract that signal from the 
experimental noise. 

The computational task facing numerical relati- 
vists who study problems such as binary inspiral is 
formidable. In particular, such problems are intrin- 
sically *3D," to use the CFD (computational fluid 
dynamics) nomenclature in which time dependence 
is always assumed. That is, the PDEs that must be 
solved govern functions, F(t, x^), that depend on all 
three spatial coordinates, xt, as well as on time, t. 
Unfortunately, even a cursory description of 3D 
work in numerical relativity as it stands at this time 
is far beyond the scope of this article. 

What follows, then, is an outline of a traditional 
approach to numerical relativity that underpins 
many of the calculations from the early years of 
the field (1970s and 1980s), most of which were 
carried out with simplifying restrictions to 
either spherical symmetry or axisymmetry. The 
mathematical development, which will hereafter be 
called the 3+ 1 approach to general relativity, has 
the advantage of using tensors and an associated 
tensor calculus that are reasonably intuitive for the 
physicist. This “standard” 3 4-1 approach is also 
sufficient in many instances (particularly those 
with symmetry) in the sense that it leads to well- 
posed sets of PDEs that can be discretized and 
then solved computationally in a convergent 
(stable) fashion. In addition, a thorough under- 
standing of the 3+1 approach will be of sig- 
nificant help to the reader wishing to study any of 
the current literature in numerical relativity, 
including the 3D work. 

However, the reader is strongly cautioned that 
the blind application of any of the equations that 
follow, especially in a 3D context, may well lead 
to “ill-posed systems,” numerical analysis of which 
is useless. Anyone specifically interested in using 
the methods of numerical relativity to generate 
discrete, approximate solutions to [1], particularly 
in the generic 3D case, is thus urged to first 
consult one of the comprehensive reviews of 
numerical relativity that continue to appear at 
fairly regular intervals (see, e.g., Lehner (2001), or 
Baumgarte and Shapiro (2003)). Most such refer- 
ences will also provide a useful overview of many 
of the most popular numerical techniques that are 
currently being used to discretize (convert to 
algebraic form) the Einstein equations, as well as 
the main algorithms that are used to solve the 
resulting discrete equations. These subjects are not 
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described below, not least since discussion of the 
available discretization techniques only makes 
sense in the context of PDEs of specific systems 
with specific boundary conditions, while there is 
only space here to describe the general mathema- 
tical setting for 3-- 1 numerical relativity. 


The 3 十 1 Spacetime Split 


At least at the current time, computations in 
numerical relativity are restricted to the case of 
globally hyperbolic spacetimes. A spacetime (four- 
dimensional pseudo-Riemannian manifold), My, 
endowed with a metric, g,,,, is globally hyperbolic 
if there is at least one edgeless, spacelike hypersur- 
face, X(0), that serves as a Cauchy surface. That is, 
provided that the initial data for the gravitational 
field are set consistently on X(0) — so that the four 
constraint equations are satisfied (see below) — the 
entire metric g,,(£,x') can be determined from the 
field equations [1] (with appropriate boundary 
conditions), and thus, so can the complete geometric 
structure of the spacetime manifold. 

To be sure, global hyperbolicity is restrictive. It 
excludes, for example, the highly interesting Gödel 
universe. However, particularly from the point 
of view of studying asymptotically flat solutions 
(or solutions asymptotic to any of the currently 
popular cosmologies), as is usually the case in 
astrophysics, the requirement of global hyperbolicity 
is natural. 

The 3 + 1 split is based on the complete foliation 
of Ms based on level surfaces of a scalar function, 
t — the time function. That is, the t= const. slices, 
are three-dimensional spacelike (Riemannian) hyper- 
surfaces, and, as f£ ranges from 一 co to 十 co， 
completely fill the spacetime manifold, My. In 
order for the X(t) to be everywhere spacelike, 
t must be everywhere timelike: 


gy, V"tV"t « 0 [5] 


Here V, is the spacetime covariant derivative 
operator compatible with the four metric, gw, thus 
satisfying Vagw —0, and g/" is the inverse metric 
tensor, which satisfies g/^g,, — ó",. The reader is 
reminded that ô”, is a Kronecker delta symbol; that 
is, ó, has the value 1 if jj— v, and the value 0 
otherwise. 

Furthermore, the scalar function t is now adopted 
as the temporal coordinate, so that x"-—(t,x), 
where the x’ are the three spatial coordinates. As 
noted implicitly above, since the problem under 
consideration is a pure Cauchy evolution, the range 


of t should nominally be infinite, both to the future 
as well as to the past; that is, the solution domain is 


一 Do < t < oo [3 
"Wr 
IX| = (axo!) < oc [7] 


However, this assumes that one has global 
existence for arbitrarily strong initial data, which 
is decidedly not always the case in general 
relativity. Indeed, “continued” or “catastrophic” 
gravitational collapse — that is, the process of black 
hole formation — signaled, in modern language, by 
the appearance of a trapped surface, inexorably 
leads to a physical singularity, which - the 
somewhat vague nature of the singularity theorems 
of Penrose, Hawking, and others notwithstanding — 
in actual numerical computations invariably turns 
out to be “catastrophic” in terms of Cauchy 
evolution. 

Such behavior in time-dependent nonlinear PDEs 
is quite familiar in the mathematical community at 
large, where it is frequently known as finite-time 
blow-up (or finite-time singularity). However, 
despite the fact that such behavior is one of the 
most fascinating aspects of solutions of the Einstein 
equations, the following discussion will be, impli- 
citly at least, restricted to the case of weak initial 
data, that is, to initial data for which there is global 
existence. 

With the manifold My sliced into an infinite 
stack of spacelike hypersurfaces, S(t), attention 
shifts to any single surface, as well as to the 
manner in which such a generic surface is 
embedded in the spacetime. 

First, each spacelike hypersurface, X(t), is itself a 
three-dimensional Riemannian differential manifold 
with a metric 7Y;(t, x^). (Note that in this discussion, 
the symbol £ is to be understood to represent any 
specific value of coordinate time.) From this metric, 
one can construct an inverse metric, y(t,x*), 
defined, as usual, so that 


Y^; = 8; [8] 


Associated with the spatial metric, ?;;, is a natural 
spatial covariant derivative operator, Dj, that is 
compatible with yj: 


Di; =f [9] 


With the spatial metric, yj, and its inverse, y”, in 
hand, the standard formulas of tensor analysis can 
be applied to compute the usual suite of geome- 
trical tensors. All tensors thus computed, and 
indeed, all tensors defined intrinsically to the 
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hypersurfaces S(t) are called “spatial” tensors, and 
have their indices (if any) raised and lowered with 
+ and ^j, respectively. 

Thus, the Christoffel symbols of the second kind, 
I", are given by 


I" — 3o! (Gym; + Oy — Oryx) [10] 


Note that these quantities are symmetric in their last 
two indices 


I*4 —I*g [11] 


and that they can be used, as usual, in explicit 
calculation of the action of the spatial covariant 
derivative operator on an arbitrary tensor. In 
particular, for the special cases of a spatial vector, 
V'. and a covector (1-form), W;, one has 


D, V! = à;V! +I pV" [12] 
and 
D;W; = 0,W; — T*5W, [13] 


respectively. 

Given the Christoffel symbols, the components of 
the spatial Riemmann tensor, denoted here Rist! , are 
computed using 


Rir = OT ip — OV jp HT" Tm 
- I^ I ui [14] 


Finally, the Ricci tensor, R‘; and Ricci scalar, R, are 
defined in the usual fashion 


Ri; - YR - Re! [15] 


R= Ri [16] 


The reader should again note that all of the 
tensors just defined “live” on each and every single 
spacelike hypersurface, X(t), and are thus known as 
hypersurface-intrinsic quantities. In particular, the 
spatial Riemann tensor, Rj’, which encodes all 
intrinsic geometric information about X(£), in no 
way depends on how the slice is embedded in the 
spacetime My. 

The next step in the 3+ 1 approach involves 
rewriting the fundamental spacetime line element for 
the squared proper distance, ds?, between two 
spacetime events, P and Q, having coordinates x^ 
and x^ 十 dx", respectively, 


ds? = g,, dx dx" [17] 


(t+ diy m 


/ 


Figure 1 Spacetime displacement in the 3--1 approach, 
following Misner, Thorne, and Wheeler (1973). Solid lines represent 
surfaces of constant time, t; that is, each solid line represents a 
single spacelike hypersurface, £Ł(t). Dotted lines denote trajectories 
of constant spatial coordinate, that is, trajectories with x* — const. 
The lapse function, a(t, x), encodes the (local) ratio between 
elapsed coordinate time, dt, and elapsed proper time, dr = a dt, for 
an observer moving normal to the slices (i.e., for an observer with a 
4-velocity, u^, identical to the hypersurface normal, n^). Similarly, 
the shift vector, (/(t,x*), describes the shift, ;/(t,x')dt, in 
trajectories of constant spatial coordinate — the dotted lines in the 
figure — relative to motion perpendicular to the slices. The 3+ 1 
form of the line element [18] then follows immediately from an 
application of the spacetime version of the Pythagorean theorem. 


As Figure 1 illustrates, a quick route to the 3+ 1 
decomposition of the above expression, and thus of 
the tensor g,, itself, is based on an application of 
the “four-dimensional Pythagorean theorem.” In 
setting up the calculation, one naturally identifies 
four functions, the scalar lapse, a(t,x*), and the 
vector shift, 3'(t,x*), that encode the full coordi- 
nate (gauge) freedom of the theory. That is, 
complete specification of the lapse and shift is 
equivalent to completely fixing the spacetime 
coordinate system. 

In light of the above discussion, and again 
referring to Figure 1, one readily deduces the 3 + 1 
decomposition of the spacetime line element: 


ds? = —o? dt? + yy (dx' + B'dt)(dx’ + dt) [18] 


A rearranged form of this last expression is also 
often seen in the literature: 


ds? = (^e? + 8,8 de + 28,dx* dt 
十 ^j; dx' dx’ [19] 


The following useful identifications of the *time- 
time,” “time-space,” and "space-space" pieces of 
the spacetime metric, g,,, follow immediately from 
[19]: 


goo = —a^ + BP [20] 
goi = gio = Bi = Yin [21] 
gi ^ [22] 


This last relation is an example of a useful general 
result; the purely spatial components, Qik..., of a 
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completely covariant, but otherwise arbitrary, space- 
time tensor, O,,.., constitute the components of a 
completely covariant spatial tensor. 

A straightforward calculation, which provides a 
good exercise in the use of the 3+1 calculus, 
yields the following equally useful identifications for 
various pieces of the inverse spacetime metric: g^ 


g” =a [23] 
g” - gi? " a ^ f [24] 
g! =A =a Fp [25] 


Since the Einstein field equations are equations 
with, loosely speaking, geometry on one side and 
matter on the other, tensors built from matter fields 
must also be decomposed. In particular, it is 
conventional to define tensors, p, jj, and S; that 
result from various projections of the spacetime 
stress energy tensor, T,,,, onto the hypersurface: 


pP = nun, T” [26] 
ji = -n, T"; [27] 
Si; = I; [28] 


For observers with 4-velocities u” equal to n”, and 
only for those observers with ^-^, the above 
quantities have the interpretation of the locally and 
instantaneously measured energy density, momen- 
tum density, and spatial stresses, respectively. As 
with the geometric quantities, all of the matter 
variables, p, ji and Sy defined in [26|-[28] are 
spatial tensors and thus have their indices (if any) 
raised and lowered with the 3-metric. Note that the 
identification S; — T; is another illustration of 
the general result mentioned in the context of the 
previous identification of ^j; and gj. 

Finally, observing that time parameters are natu- 
rally defined in terms of level surfaces (equipotential 
surfaces), it should be no surprise that the covariant 
components, 77,, of the hypersurface normal field, 


n, = (=a, 0,0, 0) (29) 


are simpler than the components, n”, of the normal 
itself, 


n" — e T: [30] 


and, in fact, eqn [29] can also be deduced from a 
quick study of Figure 1. 

In the 3 4- 1 approach, in addition to the 3-metric, 
^ri (ts xt), and coordinate functions, o(£,x/ and 
B(t,x'), it is convenient to introduce an additional 
rank-2 symmetric spatial tensor, K;(t, x*), known as 


the extrinsic curvature (or second fundamental 
form). This additional tensor is analogous to a 
time derivative of ~;(t, x*), or, from a Hamiltonian 
perspective, to a variable that is dynamically 
conjugate to ”i(t, x^). 

As the name suggests, the extrinsic curvature 
describes the manner in which the slice 3(t) is 
embedded in the manifold (to be contrasted with 
Rijn! defined by [14] which is, as mentioned 
previously, completely insensitive to the manner in 
which the hypersurface is embedded in My). 

Geometrically, Kj is computed by calculating the 
spacetime gradient of the normal covector field, 7,,, 
and projecting the result on to the hypersurface, 


Ki = -iVin; [31] 


where it must be stressed that V,, is the spacetime 
covariant derivative operator compatible with the 
4-metric, gag; that is, V,g,5—O. A straightforward 
tensor calculus calculation then yields the following, 
which can be viewed as a definition of the Kj: 


] 

icd 2a 
Here, D; is the spatial covariant metric, compatible 
with 4;(Djy;-—0), that was defined previously. 
Observe that this equation can be easily solved for 
Ori; (this will be done below), and thus, in the 3 + 1 
approach it is [32] that is the origin of the evolution 
equations for the 3-metric components, ¥j. 


(Ori + Dib; + Dj8;) [32] 


Einstein’s Equations in 3+ 1 Form 
The Constraint Equations 


As is well known, as a result of the coordinate (gauge) 
invariance of the theory, general relativity is overdeter- 
mined in a sense completely analogous to the situation 
in electrodynamics with the Maxwell equations. One 
of the ways that this situation is manifested is via the 
existence of the constraint equations of general 
relativity. Briefly, starting from the naive view that 
the ten metric functions, g,,(f, x'), that completely 
determine the spacetime geometry are all dynamical — 
that is, that they satisfy second-order-in-time equations 
of motion — one finds that the Einstein equations do not 
provide dynamical equations of motion for the lapse, 
a, or the shift, 5'. Rather, four of the field equations [1] 
are equations of constraint for the “true” dynamical 
variables of the theory, [55,O0;y;], or, equivalently, 
(yj, K'j]. Note that in the following, the mixed 
form, K';, is at times used — again by convention — as 
the principal representation of the extrinsic curvature 
tensor (instead of K; as previously, or K"). 
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Thus, four of the components of [1] can be 
written in the form 


Cr (ag, K’ j Oni HOV; OK) - T^ [33] 


where T" depends only on the matter content in the 
spacetime. Note that in addition to having no 
dependence on 3P ijs the constraints are also 
independent of a and 5'. 

If the Einstein equations [1] are to hold throughout 
the spacetime, then the constraints [33] must hold on 
each and every spacelike hypersurface, E(t), including, 
crucially, the initial hypersurface, X(0). From the point 
of view of Cauchy evolution, this means that the 12 
functions, (;;(0, x^), K';(0, x*)}, constituting the grav- 
itational part of the initial data, are not completely 
freely specifiable, but must satisfy the four constraints 


on (250, 2), Ki«(0, x^), .. ) —T^(0,x*) [34] 


However, provided initial data that do satisfy the 
equations is chosen, then — as consistency of the 
theory demands - the dynamical equations of 
motion for the {7j, K';] (eqns [37] and [38] below) 
guarantee that the constraints will be satisfied on all 
future (or past) hypersurfaces, E(t). In this internal 
self-consistency, the geometrical Bianchi identities, 
V,G'""—0, and the local conservation of stress 
energy, V,,T"" — 0, play crucial roles. 

In the 34- 1 approach, as one would expect, the 
constraint equations further naturally subdivide into 
a scalar equation 


R — KK" + K? = 167p [35] 
and a (spatial) vector equation 


D;K! — D'K = 8j [36] 
where the energy and momentum densities, p and j = 
yj, are given by [26]-[28]. Equations [35] and [36] 
are often known as the Hamiltonian and momentum 
constraint, respectively, not least since the behavior of 
their solutions as X = y/jx'x!— oo encodes the 
conserved mass and linear momentum (four numbers) 
that can be defined in asymptotically flat spacetimes. 

In a general 3 + 1 coordinate system, and with an 
appropriate choice of variables, the constraints can 
be written as a set of quasilinear elliptic equations 
for four of the (j;,K'; (or, more properly, for 
certain algebraic combinations of the ([^j,K';]). 
Thus, especially for 2D and 3D calculations, the 
setting of initial data for the Cauchy problem in 
general relativity is itself a highly nontrivial mathe- 
matical and computational exercise. Readers 
wishing more details on this subject are directed to 
the comprehensive review by Cook (2000). 


The Evolution Equations 


As discussed above, in the 3 十 1 form of the Einstein 
equations [1], the spatial metric, jj. and the 
extrinsic curvature, K';, are viewed as the dynamical 
variables for the i field. The remainder 
of the 3 + 1 equations are thus two sets of six first- 
order-in-time evolution equations; one set for yj, 


By = — Layk K*; + PO 
+ ^4 0;8* + 340,9" [37] 
and the other set for K';, | 
0,K';— 8*9, K'; — OLB'K*;4- 0,8  K', — D'Dja 
+ o (RI; - KK'; - 81(36,(S— p) — S';)) [38] 


As also noted previously, the evolution equations 
[37] for the spatial metric components, ^j; follow 
from the definition of the extrinsic curvature [31]. 
The derivation of the equations for the extrinsic 
curvature, on the other hand, require lengthy, but 
well-documented, manipulations of the spatial com- 
ponents of the field equations [1]. 


The (Naive) Cauchy Problem 


A naive statement of the Cauchy problem for 3 + 1 
numerical relativity is thus as follows: fix i speci- 
fied number, N, of matter fields £4(t,x*), A= 
1,2,...,.N, all minimally coupled to the gravita- 
tional field, with a total stress tensor, T,,,, given by 


N 
Tu=) Th [39 


where Ti, is the stress tensor corresponding to the 
matter field £^. Choose a topology for £(0) (e.g., R? 
with asymptotically flat boundary conditions; T°, 
with no boundaries, etc.) This also fixes the 
topology of My to be Rxthe topology of (0). 

Ner, freely specify eight of the 12 [5;(0,x^), 
K';(0, x k)}, as well as initial values, £^(0, x^), for the 
matter fields. Then determine the remaining four 
dynamical gravitational fields from the constraints 
[35] and [36]. This completes the initial data 
specification. 

One must now choose a prescription for the 
kinematical (coordinate) functions, a and 5, so that 
either explicitly or implicitly, they are completely fixed; 
for the case of implicit specification, this may well 
mean that the coordinate functions themselves will 
satisfy PDEs, which, furthermore, can be of essentially 
any type in practice (i.e., elliptic, hyperbolic, para- 
bolic,...). Finally, with consistent initial data, 
(^5 (0, x^), Ki (0, x^); £4(0, x*)), in hand, and with a 
prescription for the coordinate functions, the evolution 
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equations [37] and [38] can be used to advance the 
dynamical variables forward or backward in time. 

The above description is naive since, apart from a 
consistent mathematical specification, the most crucial 
issue in the solution of a time-dependent PDE as a 
Cauchy problem is that the problem be “well posed." 
Roughly speaking, this means that solutions do not 
grow without bound (*blow-up") without physical 
cause, and that small, smooth changes to initial data 
yield correspondingly small, smooth changes to the 
evolved data. In short, the Cauchy problem must be 
stable, and whether or not a particular subset of 
the equations displayed in this section yields a well- 
posed problem is a complicated and delicate issue, 
especially in the generic 3D case. The reader is thus 
again cautioned against blind application of any of the 
equations displayed in this article. 


Boundary Conditions 


In principle, because all spacelike hypersurfaces, Xt), 
in a pure Cauchy evolution are edgeless — and provided 
that the initial data {y;(0, x^), K';(0, x*); €4(0,x*)} is 
consistent with asymptotic flatness, or whatever other 
condition is appropriate given the topology of the 
X(t) — there are essentially no boundary conditions to 
be imposed on the dynamical variables, {y;(t, x^), 
K!;(t, x*)}, during Cauchy evolution. Note that asymp- 
totic flatness generally requires that 


| 1 
Jim vi = fy + O (x) [40] 
and 
li 1 1 
yo Og xh 


where X is defined by 


X= ix x! [42] 
as previously, and f; is the flat 3-metric. Similarly, 
should the lapse, o, and. shift, 5, be constrained by 
elliptic PDEs — as is frequently the case in practice 一 
then the only natural place to set boundary condi- 
tions is at spatial infinity, and then, provided that 
the frame at spatial infinity is inertial, with 
coordinate time ¢ measuring proper time, one should 
have 


i 1 
dim a=1+0O (x) [43] 


and 


"Tm 1 
Jim 6’ =O (x) [44] 


It is critical to note at this point, however, that in 
the vast bulk of past and current work in numerical 
relativity, including most of the ongoing work in 
3D, the Einstein equations [1] have been solved, not 
as a pure Cauchy problem, but as a mixed initial- 
value/boundary-value (IBVP) problem. That is, in 
the discretization process in which the continuum 
equations |1] are replaced with algebraic equations, 
the continuum domain [6]-[7] is typically replaced 
with a truncated spatial domain 

e| < Xmax [45] 
where the X',. are a priori specified constants 
(parameters of the computational solution) that 
define the extremities of the “computational box.” 
As one might expect, the theory underlying stability 
and well-posedness of IBVP problems — especially 
for differential systems as complicated as [1] - is 
even more involved than for the pure initial-value 
case, and is another very active area of research in 
both mathematical and numerical  relativity 
(see, e.g., Friedrich and Nagy (1999)). 


See also: Critical Phenomena in Gravitational Collapse; 
Einstein Equations: Initial Value Formulation; Fluid 
Mechanics: Numerical Methods; General Relativity: 
Overview; Geometric Analysis and General Relativity; 
Gravitational Waves; Hamiltonian Reduction of Einstein's 
Equations; Magnetohydrodynamics; Spacetime 
Topology, Causal Structure and Singularities; Symmetric 
Hyperbolic Systems and Shock Waves. 
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Introduction 


Consider a dynamical system with coordinates 
d! (i=1,...,n) and Lagrangian L(d', d') (field theory 
is formally covered by regarding the spatial coordi- 
nates as a continuous index). When going to the 
Hamiltonian formulation, it is usually assumed that 
the Legendre transformation between the velocities 
d' and the momenta 
OL 
Di i ogi [1] 


can be inverted to yield the velocities as functions of 
the q’s and the p’s. This “regular” situation occurs 
for most systems appearing in standard classical 
mechanics and enables one to proceed to the 
Hamiltonian formulation of the theory without 
difficulty. 

In field theory, however, the regular case is the 
exception rather than the rule. This is due to gauge 
invariance and first-order Lagrangians. 


e Gauge invariance A system possesses gauge sym- 
metries if it is invariant under transformations that 
involve arbitrary functions of time (gauge trans- 
formations). In that case, the solution of the 
equations of motion with given initial data is not 
unique, since it is always possible to perform a 
gauge transformation in the course of the evolution 
without changing the initial data. It is then clear 
that the Legendre transformation cannot be inver- 
tible, for if it were, one could rewrite the equations 


of motion in the standard canonical form 
d'—0H/Op;,p,— —OH/Oq'. These canonical 
equations are in normal form and have a unique 
solution for given initial data, which would 
contradict the presence of a gauge symmetry. 

A simple example that illustrates this phenom- 
enon is given by the following model for three 
variables q!, q?, and A, the Lagrangian of which 
reads 


L=3((q' -a qn - A?) p 


This model is inspired by electromagnetism: the 
variables q! and q* play a role somewhat similar 
to that of the spatial components of the vector 
potential, while 入 corresponds to the temporal 
component. The Lagrangian is invariant under the 
gauge transformations 

1 1 2 2 . 
q+ q dq te 一 人 十 上 [| 
where & is an arbitrary function of time. The 
conjugate momenta are 
TA = 0 


pı = — À, pa-—dq-—A, 


One cannot invert the Legendre transformation 
since one cannot express the velocity A in terms of 
the momenta. 

First-order Lagrangians Fermionic fields obey 
first-order equations. Their Lagrangian is linear 
in the derivatives, so that the conjugate momenta 
p; depend on the coordinates 9 only. It is then 
clearly impossible to express the velocities in 
terms of the momenta through the Legendre 
transformation. More generally, any first-order 
Lagrangian with or without gauge symmetry leads 
to a noninvertible Legendre transformation. 
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A simple system that exhibits this feature is 
described by the Lagrangian 


L = 27g! -i(gy [4] 


for two bosonic degrees of freedom (z!, z?). This 
is in fact the canonical form of the Lagrangian for 
a free particle in one dimension (z? is the 
momentum conjugate to the position z!): the 
system is already in Hamiltonian form. There is 
no gauge invariance, but because the Lagrangian 
is first order, the Legendre transformation with 
[4] as starting point, 


2 


Picea 2-0 [5] 


is non invertible for the velocities (which do not 
even appear in the formulas for the momenta). 


Dirac showed how to develop the Hamiltonian 
formalism in the case when the Legendre transfor- 
mation is not invertible. One can still reformulate 
the equations in phase space and write them in terms 
of brackets with the Hamiltonian, but a new major 
feature emerges, namely the canonical variables are 
no longer free. Rather, the permissible phase-space 
points are constrained to be on the so-called 
“constrained surface.” For this reason, systems for 
which the Legendre transformation is not invertible 
are also called “constrained Hamiltonian systems.” 
We shall adopt this terminology here. 

The purpose of this article is to explain the main 
ideas underlying the Dirac method. To simplify the 
discussions and to focus on the features peculiar to 
the Dirac construction, we shall assume as a rule 
that all necessary smoothness conditions are fulfilled 
by the functions, surfaces, etc., appearing in the 
formalism. How to develop the analysis when some 
of the smoothness conditions are not fulfilled is of 
definite interest but goes beyond the scope of this 
review. We shall also assume, for definiteness, that 
all the variables are bosonic in order to avoid 
straightforward but somewhat cumbersome sign 
factors in the formulas. 


General Theory 
Dirac Algorithm 


Primary constraints When the Legendre transfor- 
mation [1] cannot be inverted, the momenta p;'s do 
not span an n-dimensional space but are constrained 
by relations 


MEP) =O, fui, «M [6] 


which follow from their definition. These equations 
reduce to identities when the momenta are replaced 


by their expression [1] in terms of the coordinates 
and the velocities. They are called primary con- 
straints. We shall assume that the matrix 


O( dm) 


O( pi; q') 
is everywhere of constant (maximum) rank M on the 
phase-space surface defined by eqns [6] which is 
assumed to be smooth. This surface is of dimension 
2n — M. 


Canonical Hamiltonian The next step in the Dirac 
procedure is to define the canonical Hamiltonian H 
through 


H-dpi-L [7] 


As shown by Dirac, H can be re-expressed as a 
function H(q, p) of the momenta and the coordi- 
nates, even when the Legendre transformation is not 
invertible: the canonical Hamiltonian H depends on 
the velocities only through the p;'s. Furthermore, the 
original equations of motion in Lagrangian form are 
equivalent to the Hamiltonian equations 


„ÔH ss 
Di ay Oq' Oq' [9] 
dm(q, p) — 0 [10] 


where the zs are parameters, some of which will 
be determined through the consistency algorithm to 
be discussed shortly. (In [7]-[9] and everywhere 
below, there is a summation over the repeated 
indices.) 


Secondary constraints The equations of motion [8] 
and [9] can be rewritten as 


F = [F,H] + u” [F, bm] [11] 


where F= F(q,p) is any function of the canonical 
variables. Here, the Poisson bracket is defined as 
usual by 
ƏGƏR GA yy 
-= Oq'Op; Op;Oq 

If one takes for F one of the primary constraints 
Om, one should get zero, $,, — O0. This yields the 
consistency conditions 


[Øm, H] T y" [pm, pm | =i0 [13] 


These conditions can imply further restrictions on the 
canonical variables and/or impose conditions on the 


|G, F| 


variables 4". Any new relation X(q,p) —0 on the 
canonical variables leads, in turn, to a further consis- 
tency condition X —[X, H] + u” [X, ó,,] —0, which 
can bring in either further restriction on the constraint 
surface or fix more variables uv”. Constraints that 
follow from the consistency algorithm are called 
"secondary constraints." Finally, one is left with a 
certain number of secondary constraints, which are 
denoted by ¢, — 0, & — M + 1,...,M + K. We assume 
again that all the constraints (primary and secondary) 
define a smooth surface, called the *constraint surface," 
and fulfill the condition that 0(6,)/O(q',p;) is of 
maximum rank J = M + K on the constraint surface. 
(We also assume for simplicity that there is no 
branching in the consistency algorithm.) 


Restrictions on the ws 
constraints 


Having a complete set of 


o=0, j—1,...,M Fr KzJ [14] 


we can now investigate more precisely the restric- 
tions on the variables u”. These read 


[hj H] + v" lp $m] 0, ;—1,....] [15] 


where the notation ~ means “equal modulo the 
constraints.” In [15], m is summed from 1 to M. 
Equations [15] are a set of J linear, inhomogeneous 
equations for the ws, with coefficients that are 
functions of the canonical variables q', p;. The 
general solution of this system is of the form 


i" = U" pu V” [16] 


where U” is a particular solution and where the V?” 
(a — 1,..., A) provide a complete set of independent 
solutions of the homogeneous system 


Vi [Pj Om] ~ 0 [17] 


The coefficients u*(a=1,...,A) 
arbitrary. 

We thus see the emergence of another new feature 
in the theory, in addition to the appearance of 
constraints. It is that the general solution of the 
equations of motion may contain arbitrary functions 
of time (when A#0), in agreement with the 
possible presence of a gauge symmetry. 


are completely 


First- and Second-Class Constraints 


First- and second-class functions A function F(q, p) 
is called a first-class function if it generates a 
canonical transformation that maps the constraint 
surface on itself. Thus, F(g,p) is first class if its 
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Poisson brackets with all the constraints vanish 
weakly (i.e., are zero on the constraint surface), 


EE [18] 


A function is second class otherwise, that is, if there 
is at least one constraint ¢; such that [F, ¢;] 4 0 
(not even weakly). Second-class functions generate 
canonical transformations that do not leave the 
constraint surface invariant. Since canonical trans- 
formations that map the constraint surface on itself 
form a group, the Poisson bracket of two first-class 
functions is itself a first-class function. 

Because the system is constrained to lie on the 
constraint surface, the only allowed canonical 
transformations are those that are generated by 
first-class functions. The importance of the distinc- 
tion between first-class and second-class functions 
stems from this elementary fact. Note, in particular, 
that the time evolution is generated — as it should — 
by a first-class generator since the equations of 
motion [11] can be rewritten as 


F e [F, H'] + w^ [F, V” bm [19] 


F, $i] zx 0, 


with 
H' = H 4 U"$,, [20] 
One has both [H', om] ~ 0 and [V óm, dj] e 0. 


Splitting of the constraints One can separate 
the constraints between first-class and second-class 
constraints. This can be achieved by considering the 
matrix Cj; of the Poisson bracket of the constraints, 


Gr ledel Af des]. Bi 
One has the following theorem due to Dirac. 


Theorem 1 If det Cj; = 0, there exists at least one 
first-class constraint among the ¢j;’s. 


Proof Straightforward: if det Cj; z 0, one can find 
a nontrivial solution X of XCj; ~0. The corre- 
sponding constraint N9; is easily verified to be first 
class. 


By redefining the constraints as $j 一 $= aj bj 
with a; (q,p) invertible, one can bring the Poisson 
brackets of the constraints to the form 
leon Yp] = 0, 


Xa; Xa] = Cag [22] 


with (办 ) 三 (44, Xa) and where the matrix Cag is 
invertible. (We assume, for simplicity, throughout 
that the rank of the matrix Cj; is constant on the 
constraint surface (“regular case").) In this repre- 
sentation, the constraints are completely split into 
first-class constraints (ya) and second-class 


Fes Kewl = 0; 
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constraints (Xa): there is no first-class constraint left 
among the x,'s, and the set {ya} exhausts all the 
first-class constraints. Note that now the index 
a=1,...,A,A+1,...,A runs over all (primary and 
secondary) first-class constraints. 

This separation of the constraints into first-class 
and second-class constraints is quite important 
because, as already seen above, the first-class 
constraints generate admissible canonical transfor- 
mations, while the second-class constraints do not. 

For a bosonic system, the matrix Cap is antisym- 
metric. As Cag is invertible, this implies that the 
number of second-class constraints is even. In the 
fermionic case, Cag is symmetric (in the fermionic 
sector) and, therefore, the number of second-class 
constraints can be even or odd. 


First-class constraints and gauge symmetries The 
first-class constraints not only map the constraint 
surface on itself, but generate, in fact, transforma- 
tions that do not change the physical state of the 
system, that is, gauge transformations. Indeed, the 
presence of arbitrary functions in the solutions of 
the equations of motion indicates that the q’s and 
the p’s involve some redundancy and are not all 
physically distinct. Only those phase-space functions 
whose time evolution does not depend on the 
arbitrary functions u° are observables. 

That the first-class constraints generate gauge 
transformations is rather clear in the case of the 
first-class primary constraints, since these appear 
explicitly in the generator of the time evolution 
multiplied by arbitrary functions. That it also holds 
for the first-class secondary constraints is known as 
the “Dirac conjecture.” This conjecture can be 
proved under reasonable assumptions (see, e.g., 
Henneaux et al. 1990). The reason that the 
secondary first-class constraints also correspond to 
gauge transformations is that they appear in the 
brackets of the Hamiltonian with the primary first- 
class constraints. Thus, different choices of arbitrary 
functions 4^ in the dynamical equations of motion 
will lead to phase-space points that differ by a 
canonical transformation whose generator involves 
the secondary first-class constraints as well. 

In any case, as noted below, one must identify the 
phase-space points in the same orbit generated by all 
the first-class constraints (primary and secondary) in 
order to get a reduced space with a symplectic 
structure (“reduced phase space”). For this reason, 
one postulates that the first-class constraints always 
generate gauge transformations, even for systems 
which are counterexamples to the Dirac conjecture 
(iie, in that case, one defines the gauge 


transformations as being the transformations gener- 
ated by the first-class constraints). 

The extended Hamiltonian Hg is defined to be the 
sum of the first-class Hamiltonian [20] and of all the 
first-class constraints y, multiplied by an arbitrary 
Lagrange multiplier, 


Hg = H' V^, [23] 


(with a summed from 1 to A). It is the generator of 
the time evolution in which the complete gauge 
symmetry is fully displayed. 


Elimination of second-class constraints — Dirac 
brackets Second-class constraints do not generate 
permissible canonical transformations, since they do 
not map the constraint surface on itself. For this 
reason, it is convenient to eliminate them. This can 
consistently be done by using the Dirac brackets 
instead of the Poisson brackets. By definition, the 
Dirac bracket [F,G]p of two phase-space functions 
F and G is given by 


[F, D]p = [F, G] - [F,xe]C "[xa, G] — 24] 
where C^ is the inverse to Cag, 
Ca B Cp, - i 


(which exists since the y,’s are second class). As 
shown by Dirac, the bracket [24] is indeed a bracket 
(antisymmetry, derivation property, and Jacobi 
identity). Furthermore, it fulfills the crucial property 
that the Dirac bracket of anything with any second- 
class constraint is zero, 


[F, Xa]p = 0 


Thus, one can consistently eliminate the second-class 
constraints and replace the Poisson bracket by the 
Dirac bracket. Once this is done, one has fewer 
canonical variables and only first-class constraints 
remain (if any). It also follows from the definition 
that the Dirac bracket of two first-class functions is 
equal to their Poisson bracket. 


(F arbitrary) [25] 


Gauge conditions One can push the reduction 
procedure further and eliminate the first-class con- 
straints by means of gauge conditions. Gauge condi- 
tions C,=0 are conditions on the phase-space 
variables which do not follow from the Lagrangian 
and which have the property that they cut each gauge 
orbit once and only once. Since the gauge transfor- 
mations are generated by the first-class constraints, 
this requirement is (locally) equivalent to 


[Ca wlt +O 2 c^ HO [26] 


That is, the constraints (yz Cp) form together a 
second-class system: there is no first-class constraint 
left once the conditions C,=0 are included. One 
can then eliminate all the constraints and gauge 
conditions and introduce the corresponding Dirac 
bracket. For gauge-invariant functions, this Dirac 
bracket coincides with the original Poisson bracket. 

The reduced phase space is the unconstrained 
space obtained after this reduction, equipped with 
the Dirac bracket. It has dimension 2n — s — 2A, 
where 2n is the dimension of the original phase 
space, s is the number of second-class constraints, 
and A is the number of first-class constraints. In the 
bosonic case, this number is even (as it should) 
because s is even. One sees that “first-class con- 
straints strike twice" since they need gauge 
conditions. 

The observables of the theory are the reduced 
phase-space functions. They form a Poisson algebra, 
the relevant reduced phase-space bracket being the 
Dirac bracket associated with all the constraints and 
gauge conditions. The symplectic structure defined 
in the reduced phase space is nondegenerate because 
one has removed all the first-class constraints. 

The definition of reduced phase space given above 
is useful in practice but has the conceptual 
drawback of relying on gauge conditions. This 
approach does not display clearly its intrinsic 
significance and, furthermore, in the case of the 
so-called Gribov problems (global obstructions to 
cutting each gauge orbit once and only once), may 
yield the incorrect expectation that the reduced 
phase space does not exist. We shall provide a more 
intrinsic definition below, which does not involve 
gauge conditions. 


Examples 

First example (see eqn [2]). There is here one 
primary constraint, namely 74-0. The canonical 
Hamiltonian is (1/2)((p1)? + (p2)*) + A(pi + pz). 
The consistency algorithm yields the secondary 
constraint pı + p2 =0 and no condition on the ws. 
The constraints are first class. They generate the 
gauge transformations q! —q!+e, q? — q? +e, 
and 和 — +7, which coincide with the Lagrangian 
gauge transformations if one identifies 7 with é 
(€ and & are, of course, independent at any given 
time). One can fix the gauge by means of the gauge 
conditions A—0, q! -- 4? —0. The reduced phase 
space is two-dimensional and the observables can 
be identified with the functions of the gauge- 
invariant variables (1/2)(q! — q^) and fp; —po, 
which are conjugate. Any other gauge condition 
leads to the same reduced phase space. 
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Second example (see eqn [4]). The primary 
constraints are py — z? —0 and p2 —0 and define a 
two-dimensional plane in the four-dimensional 
phase space (z',z*,p1,p2). The consistency algo- 
rithm forces 4! =z? and u? =0 and does not bring 
any further constraint. The constraints are second 
class since [p2,p1 — z?] 2 1. One can eliminate pı 
and p; through the constraints. The Dirac brackets 
of the remaining variables vanish, except 
[z!, z^] 2 1. The reduced phase is the space of the 
£'s, with z? conjugate to z!. The Hamiltonian is the 
free-particle Hamiltonian , H = (1/2)(z2)^. Thus, one 
recovers the original description which was already 
in Hamiltonian form. (The recognition that a system 
is already in first-order form often enables one to 
shortcut some aspects of the Dirac procedure by not 
introducing the unnecessary momenta which would 
in any case be eliminated in the end.) 


Quantization 


The phase space of physical interest is the reduced 
phase space and the physical algebra is the algebra 
of the observables. The quantization of the theory 
then amounts to quantizing the algebra of the 
observables. This can be achieved along two 
different lines: 


1. Reduce then quantize: In this direct approach, 
one represents as quantum operators only the 
reduced phase-space functions. There is no 
operator associated with non-gauge-invariant 
functions. 

2. Quantize then reduce: In this approach, one 
represents as quantum operators the bigger alge- 
bra of functions of all the phase-space variables. 
One must then take into account the constraints. 
The second-class constraints are enforced as 
operator equations, which is consistent with the 
correspondence rule that the commutator in the 
quantum theory is ib times the Dirac bracket, 


AB — BA = ib[A, B]; [27] 


(plus higher-order terms in b). The first-class 
constraints are implemented in a more subtle 
way. It would be inconsistent to impose them as 
operator equations since in general [»y,,F]p Æ 0 
(even in the Dirac bracket). What one does is to 
impose them as conditions on the physical states: 
these are defined as the states annihilated by the 
first-class constraints, 


Yalp) =0 [28] 


For simple systems, it is easy to verify that the two 
procedures are equivalent. There is yet another 
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approach, in which one extends the system rather 
than reduce it. This is the Becchi-Rouet-Stora- 
Tyutin (BRST) approach, in which the new variables 
are called ghosts. 


Geometric Description 


We defined above first-class and second-class 
constraints through algebraic means. It turns out 
that these definitions also have a geometrical 
interpretation, which sheds considerable insight 
into their nature. 

The phase-space symplectic 2-form ø induces, by 
pullback, a 2-form oy on the constraint surface X. 
While o is of maximal rank, this may not be the case 
for the induced oy, which may be degenerate. In 
fact, the rank of oy fails to be equal to the 
maximum rank 2n 一 (where J is the total number 
of constraints) by precisely the number A of first- 
class constraints. 

Indeed, the Hamiltonian vector fields X.,; associated 
with the first-class constraints are tangent to the 
constraint surface £ and are null eigenvectors of os, 


Oy(X.,, Y) — 0 VY tangent to X [29] 


as an immediate consequence of the first-class 
property. Here, all first-class constraints (primary 
and secondary) yield a null eigenvector. The integral 
surfaces of the vector fields X-a are the gauge orbits. 
The reduced phase space is nothing else but the 
quotient space of the constraint surface by the gauge 
orbits. The 2-form induced in the quotient space is 
invertible because one has removed all degeneracy 
directions (including the ones associated with sec- 
ondary first-class constraints). Reaching the reduced 
phase space falls under the scope of Hamiltonian 
reduction. The observables are the functions on the 
reduced phase space. 

Thus, the reduced phase space is obtained through 
a two-step procedure. First, one restricts the functions 
to functions on the constraint surface X. One may 
view the algebra C**(X) of smooth functions on X as 
the quotient algebra C™(P)/N of the algebra C™(P) 
of smooth phase-space functions by the ideal V of 
phase-space functions that vanish on the constraint 
surface c. The second step in the reduction procedure 
is to impose the gauge-invariant condition on the 


functions in C^*(X)), that is, to impose that they are 
constant along the gauge orbits ©. Assuming all 
necessary smoothness and regularity conditions to be 
fulfilled (i.e, that the orbits fiber which is, for 
instance, the case if the gauge orbits are the orbits 
of a free and proper group action), one may denote 
the algebra of observables as C?(X/O). This algebra 
is a Poisson algebra because the induced 2-form on 
the quotient space X/Ó is nondegenerate. The 
algebraic description of the observables underlies the 
BRST construction. 

It is interesting to note that in the covariant 
approach to phase space, a similar two-step reduc- 
tion procedure occurs. What plays the role of the 
constraint sutface is the stationary surface in the 
space of all histories g'(t) of the dynamical variables. 
The gauge symmetry acts on this space and the 
reduced phase space is just the quotient space. One 
can establish the equivalence of the two descriptions 
(Barnich et al. 1991). 


See also: Batalin-Vilkovisky Quantization; BRST 
Quantization; Canonical General Relativity; Operads; 
Perturbative Renormalization Theory and BRST; 
Quantum Dynamics in Loop Quantum Gravity; Quantum 
Field Theory: A Brief Introduction. 
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Euclidean Quantum Fields 


The construction of a relativistic quantum field is 
still an open problem for fields in spacetime 
dimension d > 4. The conceptual difficulty that 
sometimes led to fear an incompatibility between 
nontrivial quantum systems and special relativity 
has however been solved in the case of dimension 
d=2,3 although, so far, has not influenced the 
corresponding debate on the foundations of quan- 
tum mechanics, still much alive. 

It began in the early 1960s with Wightman's work 
on the axioms and the attempts at understanding the 
mathematical aspects of renormalization theory and 
with Hepps' renormalization theory for scalar fields. 
The breakthrough idea was, perhaps, Nelson’s 
realization that the problem could really be studied 
in Euclidean form. A solution in dimensions d — 2,3 
has been obtained in the 1960s and 1970s through a 
remarkable series of papers by Nelson, Glimm, 
Jaffe, and Guerra. While the works of Nelson and 
Guerra relied on the “Euclidean approach" (see 
below) and on d — 2, the early works of Glimm and 
Jaffe dealt with d — 3 making use of the *Minkowskian 
approach" (based on second quantization). but 
making already use of a “multiscale analysis" 
technique. The latter received great impulsion and 
systematization by the adoption of Wilson’s views 
and methods on renormalization: in physics termi- 
nology, renormalization group methods; a point of 
view taken here following the Euclidean approach. 
The solution dealt initially with scalar fields but it 
has been subsequently considerably extended. 

The Euclidean approach studies quantum fields 
through the following problems: 


1. existence of the functional integrals defining the 
generating functions (see below) of the probabil- 
ity distribution of the interacting fields in finite 
volume: the “ultraviolet stability problem,” 

2. existence of the infinite-volume limit of the 
generating functions: the “infrared problem,” 
and 

3. check that the infinite volume generating 
functions satisfy the axioms needed to pass 
from the Euclidean, probabilistic, formulation 
to a Minkowskian formulation guaranteeing 
the existence of the Hamiltonian operator, 
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relativistic covariance, Ruelle-Haag scattering 
theory: the “reconstruction problem.” 


The characteristic problem for the construction of 
quantum fields is (1) and here attention will be 
confined to it with the further restriction to the 
paradigmatic massive scalar fields cases. The dimen- 
sion d of the spacetime will be d=2,3 unless 
specified otherwise. 

Given a cube A of side L, A C R47, consider the 
following functional Itera on the space of the fields on 
A, that is, on functions Pe (SN) defined for É € A, 


4 2 
Zn(A,f)= [exp (- f Que" + ung’) 
+n + fey’ jag) Py (di V) 1] 


The fields e" are called “Euclidean” fields with 
ultraviolet cutoff N > 0, fe is a smooth function with 
compact support bounded by |/z| < 1 (for definiteness), 
the constants AN > 0, uN,vw are called “bare cou- 
plings," and PN is a Gaussian probability distribution 
defining the free-field distribution with mass m and 
ultraviolet cutoff Ni the probability distribution PN 


is determined by its “covariance” CN) E er 
NI dPw, which i in the physics literature is called a 
Ai ' given by 
ei? (E—n+nL) 
(<N) 
ru a a 
y 


The sum aver the integers n € Z is introduced so that 
the field PE N) is periodic over the box A: this is not 
really necessary as in the limit L — oo either translation 
invariance would be recovered or lack of it properly 
understood, but it makes the problem more symmetric 
and generates a few technical simplifications; here 
XN() is a *regularizer" and a standard choice is 


i {y= 


N(Ipl) = [ame m] 


with ^ > 1, which is such that 


xwN(Ip) _ 1  —— 1 
p. + m? p? + YNm2 


N 
=. 1 1 3 
全 p? 十 720-2 p? + m2 [3] 


here y>1 can be chosen arbitrarily: so y=2. If 
d > 3, the above regularization will not be sufficient 
and a xx decaying faster than p% would be needed. 
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A simple estimate yields, if € € (0, 1) is fixed and c 
is suitably chosen, 


Ge” | < c4 2Ne-m-n 


Ic - c C A < cy ON (Nin — s'y [4] 
with 4/4-2N interpreted as N if d — 2. 
The 
= Zn(A,f) 
defines a “generating function” of a probability 


distribution Pint over the fields on A which will be 
called the “distribution with y*-interaction” regu- 
larized on A and at length scale m‘y: the 
integral, in [1], 


<N) =f ( (SNY |o EN? 
) : NPE HNPE 
tn tfaof att 5 


will be called the “interaction potential" with 
external field f. The regularization is introduced to 
guarantee that the integral [1], fe"* dPw, is well 
defined if Ay »0. The momenta of Pin are the 
functional derivatives of C(f): they are called 
“Schwinger functions.” 

The problem (1) can now be made precise: it is to 
show the existence of An, yn, vyn so that the limit 


. Zn (A, f) 
MR LO 


Vn (¢' 


exists for all f and is not Gaussian, that is, it is not 
the exponential of a quadratic form in f: which 
would be the case if An, un — 0 fast enough: the last 
requirement is of course essential because the 
Gaussian case describes, in the physical interpreta- 
tion, free fields and noninteracting particles, that is, 
it is trivial. Note that vy does not play a role: its 
introduction is useful to be able to study separately 
the numerator and the denominator of the fraction 


ZN(A. f) 
ZN(A, 0) 


For more details, the reader is referred to Wightman 
and Garding (1965), Streater and Wightman (1964), 
Nelson (1966), Guerra (1972), Osterwalder and 
Schrader (1973), and Simon (1974). 


The Regularized Free Field 


Since the propagator, see [4], decays exponentially 


over a scale 7»! and is smooth over a scale m !-^^N, 


the fields pe sampled with distribution Py 


are rather singular objects. Their properties cannot be 
described by a single length scale: they are extremely 
large for large N, take independent values only beyond 
distances of order m~! but, at the same time, they look 
smooth only on the much smaller scale »^! ^7. Their 
essential feature is that fixed «<1, for example, 
€ —1/2, with Py-probability 1 there is B0 such 
that (interpreting 44—2//?N as N if d — 2) 


leg | < B>N(d-2)12 
[6] 


(<N) 


of -WE | < BEAL 


WNmlé — nl) 
and furthermore the probability of the relations in 
[6] will be- N-independent, that is, qEN are 
bounded and roughly of size 4N(4- 2/2 ag N60 
and, on a very small length scale m: ^-N, almost 
constant. 

Substantial control on the field E statisti- 
cally sampled with distribution PN can be obtained 
by decomposing it, through [3], into “components 
of various scales”: that is, as a sum of statistically 
mutually independent fields whose properties 
are entirely characterized by a single scale of length. 
This means that they have size of order 1 and 
are independent and smooth on the same length 
scale. 

Assuming the side of A to be an integer multiple 
of m', let Q, be a pavement of A into boxes of 
side m'y”, imagined hierarchically arranged so 
that the boxes of Q, are exactly paved by those of 
Q, 1. 

Define z" to be the random field with propa- 
gator CE with Fourier transform 


Y (m ] eir? Ly 
y-?n? pem s m? 
nc Z^ 


(SN) 


N) 


so that y and its propagator CES ! can be repre- 
sented, see [2], [3], as 


N 
<N = b 
Qi ) - » Qu 2)/2 


(SN) . 
or = 


[7] 
b(d—2) (h) 
Unio 
b-1 

where the fields z are independently distributed 
Gaussian fields. Note that the fields z are also 
almost identically distributed because their propa- 
gator is obtained by periodizing over the period 7’ L 
the same function 


TO def [> — LE 1 
én 24 2 pom 4 m2 


that is, their propagator is 


The reason why they are not exactly equally 
distributed is that the field z/ is periodic with 
period 7’L rather than L. But proceeding with care 
the sum over n in the above expressions can be 
essentially ignored: this is a little price to pay if one 
wants translation invariance built in the analysis 
since the beginning. 

The representation [7] defines a “multiscale 
representation” of the field q^". Smoothness 
properties for the field ^" can be read from 
those of its “components” z”). Define, for A € Qo, 


| zh) — z” 
[| = max Cy + TE i Ua [8] 
A EeA,neh [a m 7 
-n| <m 


and 7 will be chosen r=0 or r=1 as needed (in 
practice 7 — 0. if d=2 and r=1 if d=3): r=1 will 
allow us to discuss some smoothness properties of 
the fields which will be necessary (e.g., if d — 3). 
Then the size |z| of any field z”), for all 5 7 1, is 
estimated by 


2 
P( max |z|4 € B) > e IA 
AC Qo 


P(lz]a = Ba, VA € D) € JI ce € BA 
AED 


[9] 


where P is the Gaussian probability distribution of 
z, D is any collection of boxes A € Q and c, c >0 
are suitable constants. The [9] imply in particular 
[6]. The estimates [9] follow from the Markovian 
nature of the Gaussian field z”), that is, from the 
fact that the propagator is the Green's function of an 
elliptic operator (of fourth order, see the first of [3]), 
with constant coefficients which implies also the 
inequalities (fixing e € (0, 1)) 

ce E | / zeznf (dz) cT pin - 
IC) - Cp. | < emn — m'y 


where |f —7]| is reinterpreted as the distance 
between é, 7 measured over the periodic box yA 
(hence |ë — n| differs from the ordinary distance 
only if the latter is of the order of 4^L). The 
interpretation of [10] is that d are essentially 
bounded variables which, on scale ~m', are 
essentially constant and furthermore beyond length 
^m are essentially independently distributed. 

For more details, the reader is referred to Wilson 


(1970, 1972) and Gallavotti (1981, 1985). 
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Perturbation Theory 


The naive approach to the problem is to fix AN 三 
A » 0 and to develop ZN(A, f) or, more conveniently 
and equivalently, (1/|A|) log Zn(A, f) in powers of A. 
If one fixes a priori uN,vN independent of N, 
however, even a formal power series is not possible: 
this is trivially due to the divergence of the 
coefficients of the power series, already to second 
order, for generic f in the limit N — oo. Nevertheless 
it is possible to determine pun(A), vN(A) as functions 
of N and A so that a formal power series exists (to 
all orders in A): this is the key result of renormaliza- 
tion theory. 

To find the perturbative expansion, the simplest is 
to use a graphical representation of the coefficients of 
the power expansion in A, in, Vyn, f and the Gaussian 
integration rules which yield (after a classical 
computation) that the coefficient of A" hife, "AE 
obtained by considering the graph elements shown in 
Figure 1, where the segments will be called half-lines 
and the graph elements will be called, respectively, 
“coupling” or *^-vertex," “mass vertex, 
vertex," and "external vertex." 

The half-lines of the graph elements are consid- 
ered distinct (ie., imagine a label attached to 
distinguish them). Then consider all possible coz- 
nected graphs G obtained by first drawing, respec- 
tively, 2, p,r graph elements in Figure 1, which are 
not vacuum vertices, with their nodes marked by 
points in A named 5,,...,6,, Gryts. -sappy and 
form all possible graphs obtained by attaching pairs 
of half-lines emerging from the vertices of the graph 
elements. These are the “nontrivial graphs." 
Furthermore, consider also the single “trivial” 
graph formed just by the third graph element and 
consisting of a single point. All graphs obtained in 
this way are particular Feynman graphs. 

Given a nontrivial graph G (there are many of 
them) we define its value to be the product 


Weli .Gn ess 


n, P 
- (4l) À HN I] I cts) 
n!p!r! ; Su 


?^ 6S 


Vacuum 


Ont ptr) 
[11] 


where the last product runs over all pairs ¢ = (£,, 7];) 
of half-lines of G that are joined and connect two 
vertices labeled by points £,, 7,: “call line of G” any 
such pair. If the graph consists of the single vacuum 


Figure 1 The graph elements to representing [^ "^, pls, 
a constant y's), 
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vertex its value will be wy. The series for 


(1/|A|) log Zn(A, f) is then 


wt | weG 


and the integral will be called the integrated graph 
value. 

Suppose first that un — vy — 0. Then if a graph G 
contains subgraphs like in Figure 2, the correspond- 
ing respective contribution to the integral in [12] 
(considering only the integrals over 7] and suitably 
taking care of the combinatorial factors) is a factor 
obtained by integrating over € the quantities 


TT 


ntp+r) II dé, [12] 
p 


(<N) 7A(<N) N) 
Puts j*- = 


[13] 
or 4 r6 oi Gere (SN)3 


which if d —3 diverge as N — oo as y or, respec- 
tively, as N; the second factor does not diverge in 
dimension d — 2 while the first still diverges as N. The 
divergences arise from the fact that as č — n — 0 the 
propagator behaves as |É— g| " if d=3 or as 
—log |€ — 9| if d —2, all the way until saturation 
occurs at distance |E — n| ~ m ^^: for this reason 
the latter divergences are called  *ultraviolet 
divergences.” 

However, if we set ux # 0, then for every graph 
containing a subgraph like those in Figure 2 there 
is another one identical except that the points 
a, are connected via a mass vertex, see Figure 1, 
with the vertex in £, by a line d$ and a line €f; 
the new graph value receives a contribution from 
the mass vertex inserted in č between @ and f 
simply given by a factor 一 IN. Therefore if we fix, 
for d 3. 


2,3] 
pin = —6AC wv 


N)3 def N 
" 人 CE NP ag — 6ACEN + Sun [14] 


we can simply consider graphs which do not contain 
any mass graph element and in which there are ro 
subgraphs like the first in Figure 2 while the subgraphs 
like the second in Figure 2 do not contribute a factor 


f cS’ Ce ta dg but a renormalized factor 


—— +O; 


o s 


Figure 2 Divergent subgraphs, if d —3. If d —2 only the first 
diverges. 


[es CO ic 一 Cx) ay. E d=2, we only 
need to define uy as the first term on the right-hand 
side (RHS) of [14] and we can leave the subgraphs like 
the second in Figure 2 as they are (without any 
renormalization). 

Graphs without external lines are called vacuum 
graphs and there are a few such graphs which are 
divergent. Namely, if d=3, they are the first three 
drawn in Figure 3; furthermore, if uy is set to the 
above nonzero value a new vacuum graph, the 
fourth in Figure 3, can be formed. Such graphs 
contribute to the graph value, respectively, the terms 
in the sum 


E 
“BCE StS De dé, — L JM 


(<N)2 -(<N)2 
" / Cre CE 


Cie edb — p ous 
and diverge, respectively, as N, yN, N,4?N if d — 3 
while, if d — 2, only the first and the last (see [14]) 
diverge, like N?. 

Therefore, if we fix vy as minus the quantity in 
[15] we can disregard graphs like those in Figure 3; 
if d —2vy can be defined to be the sum of the first 
and last terms in [15]. 

The formal series in A and f thus obtained is called 
the “renormalized series" for the field y* in 
dimension d=2 or, respectively, d —3. Note that 
with the given definitions and choices of jun, ZN the 
only graphs G that need to be considered to 
construct the expansion in 入 and f are formed by 
the first and last graph elements in Figure 1, paying 
attention that the graphs in Figure 3 do not 
contribute and, if d —3, the graphs with subgraphs 
like the second in Figure 2 have to be computed with 
the modification described. 

In the next section, it will be shown that the 
above are the only sources of divergences as N — oo 
and therefore the problem of studying [1] is solved 
at the level of formal power series by the subtraction 
in [14]. This also shows that giving a meaning to the 
series thus obtained is likely to be much easier if 
d=2 than if d=3. 

The coefficients of order k of the expansion in A 
of (1/|A|) log Zn(A, f) can be ordered by the number 
2n of vertices representing external fields: and have 
the form f SEE se En) Teo (fe d5;): the kernels 
se are the Schwinger functions of order 2n, see the 
section “Euclidean quantum fields.” 


& Q 
& € & &2 & 


Figure 3 Divergent vacuum graphs. 


Remark If d —4, the regularization at cutoff N in 
[2] is not sufficient as in the subtraction procedure 
smoothness of the first derivatives of the field 
olSN) is necessary, while the regularization [2] does 
not even imply [6], that is, not even Holder 
continuity. A higher regularization (i.e., using a 
xw like the square of the yn in [3]). Furthermore, 
the subtractions discussed in the case d —3 are not 
sufficient to generate a formal power series and 
many more subtractions are needed: for instance, 
graphs with a subgraph like the one in Figure 4 
would give a contribution to the graph value which 
is a factor 


ty 2€ yo / c dg 


also divergent as N— oo proportionally to N. 
Although this divergence could be canceled by 
changing À into Ay — A + A*£y the previously dis- 
cussed cancelations would be affected and a change 
in the value of pn would become necessary; 
furthermore, the subtraction in [14] will not be 
sufficient to make finite the graphs, not even to 
second order in A, unless a new term 
-an J ( (0 MR dë with ay=(1/2)) f OnCz, 
(€ —)* is Sdded | in the exponential in [1]. 

But all this will not be enough and still new 
divergences, proportional to \°, will appear. 


And so on indefinitely, the consequence being that 
it will be necessary to define Ay, WN, ON,LN as 
formal power series in 入 (with coefficients diverging 
as N — oo) in order to obtain a formal power series 
in A for [1] in which all coefficients have a finite 
limit as N— oc. Thus, the interpretation of the 
formal renormalized series in the case d=4 is 
substantially different and naturally harder than 
the cases d—2,3. Beyond formal perturbation 
expansions, the case d —4 is still an open problem: 
the most widespread conjecture is that the series 
cannot be given a meaning other than setting to 0 all 
coefficients of V,j>0. In other words, the con- 
jecture claims, there should be no nontrivial solution 
to the ultraviolet problem for scalar 24 fields in 
d —4. But this is far from being proved, even at a 
heuristic level. The situation is simpler if d > 5: in 
such cases, it is impossible to find formal power 
series in A for (1/|A|) log Zn(A, f), even allowing 
AN; HN, QN, YN to be formal power series in 入 with 
divergent coefficients. 


Me ON din 
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Figure 4 The simplest new divergent subgraph on d — 4. 
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The distinctions between the cases d —2,3,4, >4 
explain the terminology given to the y*-scalar field 
theories calling them  super-renormalizable if 
d — 2,3, renormalizable if d — 4 and nonrenormaliz- 
able if d > 4. Since the (divergent) coefficients in the 
formal power series defining MN, LN,QN,LN are 
called counter-terms, the y*-scalar fields require 
finitely many counter-terms (see [14]) in the super- 
renormalizable cases and infinitely many in the 
renormalizable case. The nonrenormalizable cases 
(d > 4) cannot be treated in a way analogous to the 
renormalizable ones. 

For more details, the 
to Gallavotti (1985), 
Fröhlich (1982). 


reader is referred 
Aizenman (1982), and 


Finiteness of the Renormalized Series, 
d=2,3: “Power Counting” 


Checking that the renormalized series is well defined 
to all orders is-a simple dimensional estimate 
characteristic of many multiscale arguments that in 
physics have become familiar with the name of 
“renormalization group arguments.” 

Consider a graph G with n + r vertices built over n 
graph elements with vertices €,,...,&,, each with four 
half-lines and r graph elements with vertices 
$,.156,,, representing the external fields: as 
remarked in the previous section, these are the only 
graphs to be considered to form the renormalized series. 

Develop each propagator into a sum of propaga- 
tors as in [7]. The graph G value will, as a 
consequence, be represented as a sum of values of 
new graphs obtained from G by adding scale labels 
on its lines and the value of the graph will 
be computed as a product of factors in which a 
line joining &€7 and bearing a scale label P^ 
will contribute with C replacing Ce To avoid 
proliferation of symbols, we shall call the 
graphs obtained in this way, i.e., with the scale 
labels attached to each line, still G: no confusion 
should arise as we shall, henceforth, only consider 
graphs G with each line carrying also a scale label. 

The scale labels added on the lines of the graph G 
allow us to organize the vertices of G into 
"clusters": a cluster of scale h consists in a maximal 
set of vertices (of the graph elements in the graph) 
connected by lines of scale h’ > h among which one 
at least has scale 5. 

It is convenient to consider the vertices of the 
graph elements as “trivial” clusters of highest scale: 
conventionally call them clusters of scale N + 1. 

The clusters can be of “first generation” if they 
contain only trivial clusters, of “second generation” 
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if they contain only clusters which are trivial or of 
the first generation, and so on. 

Imagine to enclose in a box the vertices of graph 
elements inside a cluster of the first generation and 
then into a larger box the vertices of the clusters of 
the second generation and so on: the set of boxes 
ordered by inclusion can then be represented by a 
rooted tree graph whose nodes correspond to the 
clusters and whose “top points" are nodes represent- 
ing the trivial clusters (i.e., the vertices of the graph). 

If the maximum number of nodes that have to be 
crossed to reach a top point of the tree starting from 
a node v is n, (v included and the top nodes 
included), then the node v represents a cluster of the 
n,th generation. The first node before the root is a 
cluster containing all vertices of G and the root of 
the tree will not be considered a node and it can 
conventionally bear the scale label 0: it represents 
symbolically the value of the graph. 

For instance, in Figure 5 a tree 0 is drawn: its 
nodes correspond to clusters whose scale is indicated 
next to them; in the second part of the drawing, the 
trivial clusters as well as the ‘clusters of the first 
generation are enclosed into boxes. 

Then consider the next generation clusters, that is, 
the clusters which only contain clusters of the first 
generation or trivial ones, and draw boxes enclosing 
all the graph vertices that can be reached from each 
of them by descending the tree, etc. Figure 6 
represents all boxes (of any generation) correspond- 
ing to the nodes of the tree in Figure 5. The 
representations of the clusters of a graph G by a tree 
or by hierarchically ordered boxes (see Figures 5 and 
6) are completely equivalent provided inside each 
box not representing a top point of the tree the scale 
b, of the corresponding cluster v is marked. For 
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Figure 5 A tree and its clusters of generation 1 and 2. 
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Figure 6 All clusters of any generation for the tree in Figure 5. 


1. 2 S 4 Go D' 7. & 9 
Figure 7 The clusters in Figure 6 after affixing the scale labels. 


instance, in the case of Figure 6 one gets Figure 7. 
By construction, if two top points € and 7 are inside 
the same box b, of scale 5, but not in inner boxes, 
then there is a path of graph lines joining č and 7 
all of which have scales >h, and one at least has 
scale 5,. 

Given a graph G, fix one of its points €, (say) and 
integrate the absolute value of the graph over the 
positions of the remaining points. The exponential 
decay of the propagators implies that if a point 7 is 
linked to a point 7 by a line of scale h the 
integration over the position of 7/ is essentially 
constrained to extend only over a distance y ^m. 
Furthermore, the maximum size of the propagator 
associated with a line of scale hb is bounded 
proportionally to 4/4-2^, Therefore, recalling that 
|f| is suppose bounded by 1, the mentioned integral 
can be immediately bounded by 


A" T AnC"T or » F 
{ots = I[^? 2)/2b, II^ db, (s, —1) [16] 
where, C being a suitable constant, the first product 
is over the half-lines / composing the graph lines and 
the second is over the tree nodes (i.e., over the 
clusters of the graph G), s, is the number of 
subclusters contained in the cluster v but not in 
inner clusters; and in [16] the scale of a half-line / is 
hy if £ is paired with another half-line to form a line 
l (in the graph G) of scale label hy. 

Denoting by > the cluster immediately containing 
v in G, by nm the number of half-lines in the 
cluster v, by ny, ry the numbers of graph elements of 
the first type or of the fourth type in Figure 1 with 
vertices in the cluster v, and denoting by n? the 
number of lines which are not in the cluster v but 
have one extreme on a vertex in v (“lines external to 
v^), the identities (k = 0) 


>, (bv = k)s, — 1) 


vroot 


= » (b, — 


v root 


Y, (kh) = Y^ (p, bye [07] 


v>root v>root 


with 


hy)(my Tf 1) 


i def 
inner dc! e 
Hn, —4n,Ttf,—1, 


hold, so that the estimate [16] can be elaborated into 


I« ~py(by—hy) 

d [18] 

d+2 d-2, 

i 5 à * 
where b, =k — 0 if v is the first nontrivial node (i.e., 
v —root), and an estimate of the integral of the 
absolute value of the graphs G with given tree 
structure but different scale labels is proportional to 
Xi, yl < oo if (and only if) p, > 0, Vv. 

But there may be clusters v with only two 
external lines n€ —2 and two graph vertices inside: 
for which p, — 0. However, this can happen only if 
d —3 and in only one case: namely if the graph G 
contains a subgraph of the second type in Figure 2 
and the three intermediate lines form a cluster v of 
scale þh, while the other two lines are external to it: 
hence on scale b'b,. In this case, one has to 
remember that the subtraction in the previous section 
has led to a modification of the contribution of such a 
subgraph to the value of the graph (integrated over 
the position labels of the vertices). As discussed in the 
previous section, the change amounts to replacing the 
propagator ifi by rp 一 c à 

This improves, in [18], the estimate of the contribu- 
tion of the line joining 7 to B from being proportional 
to f Gace ‘dn to being proportional to 
ice" or 一 oe dy; and this changes the con- 
tribution of the line mB from 44-2" to fe len 
(lé — gl) ^ dg because C? is regular on scale 
47" m^, see [10] with e = 1/2. 

Since €,7 are in a cluster of higher scale 5b, this 
means that the estimate is improved by »-(!/2^.—^). 
In terms of the final estimate, this means that p, in 
[18] can be improved to p,=p,+1/2 for the 
clusters for which p,=0. Hence, the integrated 
value of the graph G (after taking also into account 
the integration over the initially selected vertex £i, 
trivially giving a further factor |A| by translation 
invariance), and summed over the possible scale 
labels is bounded proportionally to |A|Xy, yl < oc 
once the estimate of I is improved as described. 

Note that the graphs contributing to the perturbation 
series for (1/|A|) log Zw(A,f) to order A" are finitely 
many because the number r of external vertices is r < 
2n 4- 2. (since graphs must be connected). Hence, the 
perturbation series is finite to all orders in A. 

The above is the renormalizability proof of the 
scalar y*-fields in dimension d — 2,3. The theory is 
renormalizable even if d —4 as mentioned in the 
remark at the end of the previous section. The 
analysis would be very similar to the above: it is just 
a little more involved power-counting argument. 


pv & da (4 — d)n, +r, 
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For more details, the reader is referred to Hepp 
(1966), Gallavotti (1985), sections 8 and 16. 


Asymptotic Freedom (d = 2, 3). 
Heuristic Analysis 


Finiteness to all orders of the perturbation expan- 
sions is by no means sufficient to prove the existence 
of the ultraviolet limit for ZN(A,f) or for (1/|A}) 
log ZN(A,f): and a priori it might not even be 
necessary. For this purpose, the first step is to check 
uniform (upper and lower) boundedness of Zx(A, f) 
as N — oo. 

The reason behind the validity of a bound 
elME-U.f) < Zn (A, f) € elME-.O P with E,(A, f) cutoff 
independent has been made very clear after the 
introduction of the renormalization group methods 
in field theory. The approach studies the integral 
ZN(A,f), recursively, decomposing the field pr 
into its regular components tes see [7], and 
integrating first over z/), then over z'!N~" and so on. 

The idea emerges naturally if the potential Vy in 
[1] and [4] is written in terms of the “normalized” 
variables X(N)! -Ni4-2)/2, SN). see [6]; here if d — 2 
the factor y'4~)/2 is interpreted as N!/2. 

The key remark is that as far as the integration 
over the small-scale component z^ is concerned the 
field X?V is a sum of two fields of size of order 1 
(statistically), 


N N 
x e dires 


if d=2 this becomes 


Bn 


1 N = 1) 
x^ - 2M) 4! ) 


(N-1) 
- N1/2 ”7 ^ 


N1/2 E 


and it can be considered to be smooth on scale »: !^4-N 
(also statistically). Hence, approximately constant 
and of size of order O(1) on the small cubes A of 
volume y“Nm~4 of the pavement Qy introduced 
before [7]; at the same time it can be considered to 
take (statistically) independent values on different cubes 
of Ox. This is suggested by the inequalities [8]-[10]. 
Therefore, it is natural to decompose the potential 
Vy, see [5], as a sum over the small cubes A of volume 
4455-4 of the pavement Qy as (see [14] for the 
definition of jin, vx), taking henceforth m= 1, 


Vx (2) €f SD yt [| (ope meo 


IR 


+o + fg NX) (19) 


P 
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where ^/4—?N is interpreted as N if d —2. Hence, if 
d —3.it.is 
Vu (z) 
= -y eri (ax + gy xg? 
Ace On A 
3 d 

UN + fey 位 x) 号 [20] 

where 


Tiy = (-6Acn + NNy Nd), 
DN E 3Ad, + NY Noy + ANY Nb 


and cn, cy, bn, bn, computable from [15] and [14], 
admit a limit as N — oc. While if d — 2 it is 


Vu (2%?) 


号 一 NWON J (axt + gy X2"? 
AEQN 3 
; 4x") = 
十 ZN + feN X: Al [21] 
where fix Glew and vy = 3Ac;; and cy, compu- 
table from [13], admits a limit as N — oc. 

The fields z/ and X(N LY) can be considered 
constant over boxes A € Qn: zE) =s,, AN =XA 
for € € A and the s4 can be considered statistically 
independent on the scale of the lattice Qy. 

Therefore, [20] and [21] show that integration over 
z0) in the integral defining Zw(A,f) is not too 
different from the computation of a partition func- 
tion of a lattice continuous spin model in which the 
“spins” are SA and, most important, interact extre- 
mely weakly if N is large. In fact, the coupling 
constants are of order of a power of |X'N-'| times 
O(47N) if d=3 (O(N?47N) if d=2), or of order 
O(y N4*2)? max |fel), no matter how large A and f. 

This says that the smallest scale fields are 
extremely weakly coupled. The fields XIN-D can be 
regarded as external fields of size that will be called 
BN_1, of order 1 or even allowed to grow with a 
power of N, see [6]. Their presence in Vy does not 
affect the size of the couplings, as far as the analysis 
of the integral over z' is concerned, because the 
couplings remain exponentially small in N, see [20] 
and [21], being at worst multiplied by a power of 
Bx 1, i.e., changed by a factor which is a power of N. 

The smallness of the coupling at small scale is a 
property called *asymptotic freedom." Once fields 
and coordinates are “correctly scaled," the real size 
of the coupling becomes manifest, that is, it is 
extremely small and the addends in Vy proportional 
to the “counter-terms” | uw,vw, which looked 


divergent when the fields were not properly scaled, 
are in fact of the same order or much smaller than 
the main y*-term. 

Therefore, the integration over z? can be, heur- 
istically, performed by techniques well established 
in statistical mechanics (i.e., by straightforward 
perturbation expansions): at least if the field 
XN") is smooth and bounded, as prescribed 
by [6], with B=By_; growing as a power of N. 
In this case, denoting symbolically the integration 
over z/"! by P or by (...), it can be expected that it 
should give 


J eVNdP (2) — eVin-1+RGN)IA| 22] 


where Vj.n-; is the Taylor expansion of 
log [e's dP(z)) in powers of A (hence essentially 
in the very small parameter Ay ^ ?N) truncated at 
order j, that is, 


Vin = [(VN)]E 
V2.N-1 = iv 4 Mn? ad 
V3:N-1 = iv TR (Vin) SUMI 


((Vin((VR) = (Vx)*)) ad (Vu (V3) " eim 


i23] 


where- [-]’ denotes truncation to order j in A, 
and R(j,N) is a remainder (depending on ee) 


which can be expected to be estimated, for d = 2, 3, by 
IRG, N)| € R(j, N) 
E C BË (A N? y ANY AN [24] 


for suitable constants C;, that is, a remainder 
estimated by the (j+ 1)th power of the coupling 
times the number of boxes of scale N in A. The 
relations [22]-[24] result from a naive Taylor 
expansion (in A of the log fe"* dP(z')), taking into 
account that, in Vy as a function of z/"), the ztN)7s 
appear multiplied by quantities at most of size 
<\7*-4N7B3,, by [20] and [21] if |X'N7?| < By 4). 
In a statistical mechanics model for a lattice spin 
system, such a calculation of ZN would lead to a 
mean-field equation of state once the remainder was 
neglected. 

The peculiarity of field theory is that a relation like 
[22] and [24] has to be applied again to ViNn_1 to 
perform the integration over z^ ^ and define ViN_» 
and, then, again to Vix. 5.... Therefore, it will be 
essential to perform the integral in [22] to an order 
(in A) high enough so that the bound R(j, N) can be 


summed over N: this requires (see [24]) an explicit 
calculation of [23] pushed at least to order ;— 1 if 
d — 2. or to order j= 3 if d= 3; furthermore it is also 
necessary to check that the resulting V;. x. , can still 
be interpreted as low-coupling spin model so that 
[22] can be iterated with N — 1 replacing N and then 
with N — 2 replacing N — 1,.... 

The first necessary check towards a proof of the 
discussed heuristic *expectations" is that, defining 
recursively Vi, from V; ,,1 for h=N — 1,...,1,0 
by [23] with Vy replaced by V;j.,,; and V;iw.i 
replaced by Vi 六 the couplings between the variables 
z”) do not become “worse” than those discussed in 
the case h =N. Furthermore, the field o" has a 
high probability of satisfying [6], but fluctuations 
are possible: hence the R-estimate has to be 
combined with another one dealing with the large 
fluctuations of XP ^! which has to be shown to be 
"not worse." 

For more details, the reader is referred to Gallavotti 
(1978, 1985) and Benfatto and Gallavotti (1995). 


Effective Potentials and Their 
Scale (In)Dependence 


To analyze the first problem mentioned at the end of 
the previous section, define V;., by [23] with Vyn 
replaced by V;.,,, for h=N —1,N —2,...,0. The 
quantities V;,, which are called “effective poten- 
tials” on scale þh (and order f), turn out to be in a 
natural sense scale independent: this is a conse- 
quence of renormalizability, realized by Wilson as a 
much more general property which can be checked, 
in the very special cases considered here with 
d —2,3, at fixed j by induction, and in the super- 
renormalizable models considered here it requires 
only an elementary computation of a few Gaussian 
integrals as the case j=3 (or even j=1 if d —2) is 
already sufficient for our purposes. 

It can also be (more easily) proved for general j by 
a dimensional argument parallel to the one pre- 
sented earlier to check finiteness of the renormalized 
series. The derivation is elementary but it should be 
stressed that, again, it is possible only because of the 
special choice of the counter-terms uy, vn. If d —3, 
the boundedness and smoothness of the fields 5? 
and z/ expressed by the second of [6] and of [10] is 
essential; while if d—2 the smoothness is not 
necessary. 

The structure of V;, is conveniently expressed 
in terms of the fields X, as a sum of three terms 
a (standing for “relevant” part), Vj" (standing 
for “irrelevant” part), and a “field independent” 
part E(j,h)|A\. 
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The relevant part in d=2 is simply of the form 
[21] with p replacing N: call it Vue, if 2= 3, it is 
given by [20] with 5» replacing N plus, for b < N, a 
second *nonlocal" term 


2 31 
(rel.2) def 4^ 3! 5. [ {Ah)3 ASN)3 


2 
x (vi? — fF) dnar 


which is conveniently expressed in terms of a 
“nonlocal” field 


(<h) (<h) 

(h) def Pn, ^ Pr 
m H 
(in — m» 


4s ye d 2 ra ye wi 


.2)def $9 Sh (b)2 ,(b) 
Vis Ma | yi? A 
b RS Axe "E um 


gin- dndm 
xe c^ n= | 25 
[A |A" e) 


where 


AU 
des Cong am) d 
(y n — n'l) N 


with a,a', c' >0 and the subscript N means that the 
expression in parenthesis “saturates at scale N”, i.e., 
its denoninator becomes ^?-(1/2005-N) as |y — g'| 一 0. 

The expression [25] is not the full part of the 
potential V;., which is of second order in the fields: 
there are several other contributions which are 
collected below as “irrelevant.” 

It should be stressed that “irrelevant” is a 
traditional technical term: by no means it should 
suggest “negligibility.” On the contrary, it could be 
maintained that the whole purpose of the theory is 
to study the irrelevant terms. The irrelevant part of 
the potential can be better designated as the “driven 
part," as its behavior is *controlled" by the relevant 
part: although initially Vj;.,, b=N, contains 
no irrelevant terms, it eventually contains them for 
h<N and they keep getting generated as bh 
diminishes. Furthermore, the part of the irrelevant 
terms generated at scale hy < N becomes very small 
at scales h < ho so that the irrelevant part of V;., at 
small b (e.g., at h — 0, i.e., on the “physical scale" of 
the observer) only depend on the relevant terms in a 
few scales near Pb. 

It also turns out that the Schwinger functions are 
simply related to the irrelevant terms. 

The irrelevant part of the effective potential can 
be expressed as a finite sum of integrals of 
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monomials in the fields X if d —2, or in the fields 
x? and i if d= ae which can be written as NT 
given by 


x W(E 
[AL AZ 


p 
"TA ohai 


with the integral extended to products A, x <- x 
Ap X X (AL x A?) of boxes AEQ, and 
die... al. ) 46 the length of the shortest tree 
graph that connects all the p+2q>0 points, the 
exponents n,t are >2, and ż is >3 if 4»0; 
the kernel W depends on all ae ie Sys alf 
and it is bounded above by C; |], _ , Anum, for some 

Cj; the sums »7n, + » my decim, exceed 4j. The 
test functions f do not appear in [26] because by 
assumption they are bounded by 1: but W depends 
on the f's as well. 

The field-independent part is simply the value 
of logZwN(A,f) computed by the perturbation 
analysis in the section “Perturbation theory” up to 
order j in A but using as propagator (C!SN) — C's"); 
thus, E(j,b) is a constant depending on N but 
uniformly bounded as N —oo (because of the 
renormalizability proved in the section “Perturba- 
tion theory"). 

If d — 2, there is no need to introduce the nonlocal 
fields Y and in [26] one can simply take g=0, 
and the relevant part also can be expressed by 
omitting the term ye. :2 in [25]: unlike the d —3 
case, the estimate on the kernels W by an 
N-independent C; holds uniformly in ^h without 
having to intodnce Y. For d —2, it will therefore be 
supposed that yy *) = 0 in [25] and 4—O in [26]. 

It is not necessary to have more information on 
the structure of V;.;, even though one can find simple 
graphical rules, closely related to the ones in the 
section “Perturbation theory," to construct the 
coefficients W in full detail. The W depend, of 
course, on h but the uniformity of the bound on W 
is the only relevant property and in this sense the 
effective potentials are said to be (almost) “scale 
independent." 

The above bounds on the irrelevant part can 
be checked by an elementary direct computation if 
j <3: in spite of its “elementary character," the 
uniformity in h < N is a result ultimately playing an 
essential role in the theory together with the 
dominance of the relevant part over the irrelevant 
one which, once the fields are properly scaled, is 
“much smaller" (by a factor of order y”, see [26]), 
at least if h is large. 


Remarks 


(i) Checking scale independence for j=1 is just 
checking that [P(dz')) V4. = Vi ,. Note that 


$1458 人 A(pE — 6 Ci ep? + 3c Jag 

hence, calling ^"^: the polynomial in the integral 
(Wick's monomial of order 4), we have here an 
elementary Gaussian integral (“martingale property 
of Wick monomials"). Note the essential role of the 
counter-terms. For j > 1, the computation is similar 
but it involves higher-order polynomials (up to 4j) 
and the distinction between d=2 and d=3 
becomes important. 

(ii) Vio contains only the field-independent part 
E(j, 0)|A| (see above) which is just a number (as 
there are no fields of scale 0): by the above 
definitions, it is identical to the perturbative 
expansion truncated to jth order in A of 
log Zx(A, f), well defined as discussed earlier. 


Nonperturbative Renormalization: 
Small Fields 


Having introduced the notion of effective potential 
Vij of order j and scale h, satisfying the bounds 
(described after [26]) on the kernels W representing 
it, the problem is to estimate the remainder in [22] 
and find its relation with the value [24] given by the 
heuristic Taylor expansion. Assume 入 < 1 to avoid 
distinguishing this case from that with A > 1 which 
would lead to very similar estimates but to different 
A-dependence on some constants. 

Define xg(z/?) — 1 if |||, < Bh? for all A € Q,, 
see [8], and 0 otherwise; then the following lemma 


holds: 
Lemma 1 Let |X||, be defined as [8] with z 
replaced by X and suppose |X||, < Bh* for all A 
then, for all j > 1, it is 

] exstare 


— eVistR Ub+1)IA| [27] 
with, for suitable constants c_,c_, 
IR. (5b +1)| < R- (5 b +1) 
de tb 4 1)-Ec- e C Pony 


and R(j;b--1) given by [24] with h+1 in place 
of N. 

Since Zy(A,f) > few TI xp(z ")P(dz?) this 
immediately gives a lower bound on E=(1/|A)) 
log ZN(A,f): in fact if  xa(lz"?|)-— 1 for 


b! —1,...,b, then ||X||, « cBb^ for some c so 
that, by recursive application of Lemma 1, 


ZN(A, f) > eo D1 R-U PIA . By the remark at the 
end of the previous section, given j the lower bound 
on E just described agrees with the perturbation 
expansion of E=(1/|A|)log Zu(A,f) truncated to 
order i (in A) up to an error bounded by 


$p-1 R-(, b). 


Remark The problem solved by Lemma 1 is 
usually referred to as the small-field problem, to 
contrast it with the large-field problem discussed 
later. The proof of the lemma is a simple Taylor 
expansion in \y~” if d=3 or in Mz4 ?^ if d=2 to 
order j (in A). The constraint on z'’+!) makes the 
integrations over z"*!, necessary to compute V; 
from V;.,,,,, not Gaussian. But the tail estimates [9], 
together with the Markov property of the distribu- 
tion of z% can be used to estimate the difference 
with respect to the Gaussian unconstrained integra- 
tions of z+"; and the result is the addition of the 
small “tail error” changing R into R- in [27]. The 
estimate of the main part of the remainder R would 
be obvious if the fields z were independent on 
boxes of scale y”: they are not independent but 
they are Markovian and the estimate can be done by 
taking into account the Markov property. 


For more details, the reader is referred to Wilson 
(1970, 1972), Gallavotti (1978, 1981, 1985), and 
Benfatto et al. (1978). 


Nonperturbative Renormalization: Large 
Fields, Ultraviolet Stability 


The small-field estimates are not sufficient to obtain 
ultraviolet stability: to control the cases in which 
IX?" > Bh* for some £ or some h, or bed > Bh* for 
some IE — n| < y^^, a further idea is necessary and it 
rests on making use of the assumption that \>0 
which, in a sense to be determined, should suppress 
the contribution to the integral defining Zw(A, f) 
coming from very large values of the field. Assume 
also à< 1 for the same reasons advanced in the 
section “Effective potentials and their scale 
(1n)dependence." 

Consider first d —2. Let Dy be the “large-field 
region" where [Xe > BN* and let Vx(A/Dy) be 
the integral defining the potential in [21] extended 
to the region A/Dyx, complement of Dy. This region 
is typically very irregular (and random as X itself is 
random with distribution Py). 

An upper bound on the integral defining Zw(A, f) 
is obtained by simply replacing e"* by eVw^/Pw) 
because in Dy the first term in the integrand in [21] 
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is <—\N?7°‘(BN*)<0 and it overwhelmingly 
dominates on the remaining terms whose value is 
bounded by a similar expression with a smaller 
power of N. Then if £€° A/£ denotes the comple- 
ment in A of a set £ C A: 


Lemma 2 Let d —2. Define V,(D;) to be given by 
the expression [22] with tbe integrals extending over 
A;/Dy, and define R(j, b + 1) by [24]. Then 


je vil a) dP(z (b+1) ) EU 


where |Ry(j,h+1|< Ry (ib +1 de qus b +1) + 
c, e € PH with suitable c}, e 


Di)+Ri(jb+1)|A| [28] 


T^ 


Remark Lemma 2 is genuinely not perturbative 
and making essential use of the positivity of A. 
Below the analysis of the proof of the lemma, which 
consists essentially in its reduction to Lemma 1, is 
described in detail. It is perhaps the most interesting 
part and the core of the theory of the proof that 
truncating the expansion in A of (1/|A|) log Zu(A, f) 
to order j gives as a result an estimate exact to order 
XN*! of (1/|A|)log Zu(A, f). 


Let Ry be the cubes A € Oy in which there is at 
least one point ë where |z?"| > BN?. By definition, 
the region DN/DN_1 is covered by Ry. 

Remark that in the region Dy ,/Ry the field 
X'N-U! is large but zy is not large so that X is still 
very large: this is so because the bounds set to define 
the regions D and R are quite different being BN* 
and BN", respectively. Hence, if a point is in DN_1 
and not in Ry, then the field X must be of the 
order >> BN?. Therefore, by positivity of the Aye" 
term (which dominates all other terms so that 
VIN(pEN) < 0 for £e Dy U (Dyn-1/Ryn)) we can 
replace VN(DN) by V((Dn U (DN_1/RN))'), for the 
purpose of obtaining an upper bound. 

Furthermore, modulo a suitable correction, it is 
possible to replace V((Dn U(Dn_1/Rn))‘). by 
V((Dn-1 URn)*): because the integrand in VN is 
bounded below by 


-by ^N? 
if d=2 (by —bày™ if d=3), for some b, so that the 
points in a can at most lower V((DNU 
(Dx-41/ RN))) by —bAN? y~@-ON (Ry) if #Rn is 


the number of boxes of D in Ry and V(we) is 
bounded below by its minimum: thus, 


V((Du..1 U Rn)*°) + PAN? 40-79N 4 ( Ry) 


is an upper bound to V((Dy U (DN_1/RN))'). 

In the complement of Dy , U Ry, all fields are 
“small”; if X'*-U and Ry are fixed this region is not 
random (as a function of z) any more. Therefore, 
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if X'N-U. Ry are fixed the integration over 2%), 


conditioned to having z/") fixed (and large) in the 
region Ry, is performed by means of the same 
argument necessary to prove Lemma 1 (essentially a 
Taylor expansion in Ay-4-2?N), The large size of 
zN) in RN does not affect too much the result 
because on the boundary of Ry the field z) is 
<BN? (recalling that z" is continuous) and since 
the variable z" is Markovian, the boundary effect 
decays exponentially from the boundary ORN: it 
adds a quantity that can be shown to be bounded by 
the number of boxes in Ry on the boundary of Ry, 
hence by Z: Rx, times b'(N — 1) 4- 4-9 (B(N — 1)*)* 
for some b’. 

The result of the integration over z" of 
eVN(DNU(DN-i/RND Conditioned to the large-field 
values of z/ in RN leads to an upper bound on 
fe’ P(dz'%)) as 


> eViN-ICPN_D+RON)IAI 
RN 
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where c,c,c’ are suitable constants: this is 


explained as follows. 


1. Taylor expansion (in A) of the 


integral 
e VN(DN-1iURN) BAN? SON F(R) ( 


which, by cons- 
truction, is an upper bound on eYNX'?N)) with 
respect to the field z'“), conditioned to be fixed 
and large in Ry, would lead to an upper bound as 


@VEN-1((DN-1URN) ) R'(N)|A Hb" ACBN*)2/870N #(RN) 


with &' equal to [24] possibly with some C; 
replacing C;. The second exponential on the RHS 
of [29] arises partly from the above correction 
b"X(BN*)'s4-4-49N 4(Ry) and partly from a 
contribution of similar form explained in (3) 
below. 

2. Integration over the large conditioning fields 
fixed in Ry is controlled by the second estimate 
in [9] (the tail estimate): the first factors in 
parentheses in [29] is the tail estimate just 
mentioned, i.e., the probability that 2) is large 
in the region Ry. The second factor is only partly 
explained in (1) above. 

3. Without further estimates, the bound [29] would 
contain Vj.n-1((Dn-1URwn)*) rather than 
Vi.N_1(DN_1). Hence, there is the need to change 
the potential Vin_1((DN_1 U RN)^) by *reintrodu- 
cing" the contribution due to the fields in 
RN/DN_1 in order to reconstruct Viu 4(D5, 4). 
Reintroducing this part of the potential costs a 


quantity like b/AN?4/^-2N(BN^)* (Rx) (because 
the reintroduction occurs in the region RN/DN_1 
which is covered by RN and in such points the field 
XP ~") is not large, being bounded by B(N — 1)*); 
so that their contribution to the effective potential 
is still dominated by the 24-term and therefore by 
47 4-4N times a power of BN^ times the volume of 
RN (in units y~N, i.e., Ry). All this is taken care 
of by suitably fixing c". 


Note that the sum over Ry of [29] is 
! n2 NI4 Wy. +4—d)N NT2 4\4 N 
(1 tice EN ote N?(BN*) jr |A| 


(because A contains [AJAN cubes of Qn); hence, it is 
bounded above by ecte 7^ for suitably defined 
Eis GL 

The same argument can be repeated for V;.;,(Dj,) 
with any b if V;.;,(Dj,) is defined by the sum over A's 
in Q, of the same integrals as those in [25] and [26] 
with A;/D, replacing A; in the integration domains. 

Applying Lemma 1 recursively with ; 1 (if 
d=3 it would become necessary to take j > 3), it 
follows that there exist N-independent upper and 
lower bounds Ex|A| on logZ(A,f) of the form 
Vio 3: er (Rij, b) + cre cB y A] for C+, os >0 
suitably chosen and A-independent for A< 1. 
By the remark at the end of Sec.6, given j, the 
bounds just described agree with the perturbation 
expansion E(j,0)|A| = Vio of log Z(A,f) truncated 
to order j (in A) up to the remainders 
L5 Q4 Ra(j, b). Hence, if B is chosen proportional 
to log, A! *'log(e-- A3), the upper and lower 
bounds coincide to order 7 in A with the value 
obtained by truncating to order j the perturbative 
series. 

The latter remark is important as it implies 
not only that the bounds are finite (by the 
section “Perturbation theory") but also that the 
function (1/|A|) log Z(A,f) is not quadratic in f: 
already to order 1 in A it is quartic in f (containing a 
term equal to —A( f Cz. ofedé)*). 

The latter property is important as it excludes 
that the result is a “Gaussian” generating function. 
Thus, the outline of the proof of Lemma 2, which 
together with Lemma 1 forms the core of the 
analysis of the ultraviolet stability for d=2, is 
completed. | 

If d — 3, more care is needed because (very mild) 
smoothness, like the considered Holder continuity 
with exponent 1/4, of z, X is necessary to obtain the 
key scale independence property discussed in earlier: 
therefore, the natural measure of the size of z”) and 
X? in a box A € Q, is no longer the maximum of 
pa or of x? |. The region D, becomes more 


involved as it has to consist of the points & 
where xP > Bh* and of the pairs n,n where 


p^ 


i.e., it is not just a subset of A. 

However, if d — 3, the relevant part also contains 
the negative term V'*^2, see [25]: and since it 
dominates over all other terms which contain a 
Y-field (because their couplings [25] are smaller by 
about 47^), the argument given for d —2 can be 
adapted to the new situation. Two regions Dj, D} 
will be defined: the first consists of all the points £ 
where |X??|  Bb* and the second of all the pairs 
n,n’ where | >Bh*. The region R, will be 
the collection of all A € Q,, where ||z/? ||, > Bh?, 
see [8] with 7 —0. Then V(D;) will be defined as the 
sum of the integrals in [25] and [26] with the integrals 
over ; further restricted to č; ¢ D} and those over the 
pairs 7];, 77; are further restricted to (N; 77) € D. With 
the new settings, Lemma 2 can be proved also for 
d — 3 along the same lines as in the d — 2 case. 

For more details, the reader is referred to Wilson 
(1970, 1972), Benfatto et al. (1978), and Gallavotti 
(1981). 


Ultraviolet Limit, Infrared Behavior, and 
Other Applications 


The results on the ultraviolet stability are nonper- 
turbative, as no assumption is made on the size of 入 
(the assumption À < 1 has been imposed in the last 
two sections only to obtain simpler expressions for 
the A-dependence of various constants): nevertheless 
the multiscale analysis has allowed us to use 
perturbative techniques (i.e., the Taylor expansion 
in Lemmata 1, 2) to find the solution. The latter 
procedure is the essence of the renormalization 
group methods: they aim at reducing a difficult 
multiscale problem to a sequence of simple single- 
scale problems. Of course, in most. cases, it is 
difficult to implement the approach and the scalar 
quantum fields in dimensions 2,3 are among the 
simplest examples. The analysis of the beta function 
and of the running couplings, which appear in 
essentially all renormalization group applications, 
does not play a role here (or, better, their role is so 
inessential that it has even been possible to avoid 
mentioning them). This makes the models somewhat 
special from the renormalization group viewpoint: 
the running couplings at length scale b, if intro- 
duced, would tend exponentially to 0 as b — oc; 
unlike what happens in the most interesting 
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renormalization group applications in which they 
either tend to zero only as powers of b or do not 
tend to zero at all. 

The multiscale analysis method, i.e., the renorma- 
lization group method, in a form close to the one 
discussed here has been applied very often since its 
introduction in physics and it has led to the solution 
of several important problems. The following is not 
an exhaustive list and includes a few open questions. 


1. The arguments just discussed imply, with minor 
extra work that Zn(A, f) as N — oo not only admit 
uniform upper and lower bounds but also that the 
limit as N — oo actually exists and itis a C* function 
of A, f. Its A and f-derivatives at å = 0 and f — 0 are 
given by the formal perturbation calculation. In some 
cases, it is even possible to show that the formal series 
for ZN(A, f) in powers of À is Borel summable. 

2. The problem of removing the infrared cutoff (i.e., 
A — oc) is in a sense more a problem of statistical 
mechanics. In fact, it can be solved for d — 2, 3 by a 
typical technique used in statistical mechanics, the 
"cluster expansion." This is not intended to mean 
that it is technically an easy task: understanding its 
connection with the low-density expansions and 
the possibility of using such techniques has been a 
major achievement that is not discussed here. 

3. The third problem mentioned in the introduction, 
that is, checking the axioms so that the theory could 
be interpreted as a quantum field theory is a difficult 
problem which required important efforts to con- 
trol and which is not analyzed here. An introduction 
to it can be its analysis in the d — 2 case. 

4. Also the problem of keeping the ultraviolet cutoff 
and removing the infrared cutoff while the para- 
meter 7 in the propagator approaches 0 is a very 
interesting problem related to many questions in 
statistical mechanics at the critical point. 

5. Field theory methods can be applied to various 
statistical mechanics problems away from criti- 
cality: particularly interesting is the theory of the 
neutral Coulomb gas and of the dipole gas in two 
dimensions. 

6. The methods can be applied to Fermi systems in 
field theory as well as in equilibrium statistical 
mechanics. The understanding of the ground state 
in not exactly soluble models of spinless fermions 
in one dimension at small coupling is one of the 
results. And via the transfer matrix theory it has 
led to the understanding of nontrivial critical 
behavior in two-dimensional models that are not 
exactly soluble (like Ising next-nearest-neighbor or 
Ashkin-Teller model) Fermi systems are of 
particular interest also because in their analysis 
the large-fields problem is absent, but this great 
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10. 


technical advantage is somewhat offset by the 
anticommutation properties of the fermionic 
fields, which do not allow us to employ 
probabilistic techniques in the estimates. 

An outstanding open problem is whether the scalar 
y*-theory is possible and nontrivial in dimension 
d=4: this is a case of a renormalizable not 
asymptotically free theory. The conjecture that 
many support is that the theory is necessarily trivial 
(i.e., the function Zw(A, f) becomes necessarily a 
Gaussian in the limit N — o). One of the main 
problems is the choice of the ultraviolet cut-off; 
unlike the d — 2,3 cases in which the choice is a 
matter of convenience it does not seem that the 
issue of triviality can be settled without a careful 
analysis of the choice and of the role of the 
ultraviolet cut-off. 


. Very interesting problems can be found in the 


study of highly symmetric quantum fields: gauge 
invariance presents serious difficulties to be 
studied (rigorously or even heuristically) because 
in its naive forms it is incompatible with 
regularizations. Rigorous treatments have been 
in some cases possible and in few cases it has been 
shown that the naive treatment is not only not 
rigorous but it leads to incorrect results. 


. Inconnection with item (8) an outstanding problem 


is to understand relativistic pure gauge Higgs fields 
in dimension d — 4: the latter have been shown to be 
ultraviolet stable but the result has not been 
followed by the study of the infrared limit. 

The classical gauge theory problem is quantum 
electrodynamics, QED, in dimension 4: it is a 
renormalizable theory (taking into account gauge 
invariance) and its perturbative series truncated 
after the first few orders give results that can be 
directly confronted with experience, giving very 
accurate predictions. Nevertheless, the model is 
widely believed to be incomplete: in the sense that, 
if treated rigorously, the result would be a field 
describing free noninteracting assemblies of 
photons and electrons. It is believed that QED 
can make sense only if embedded in a model with 
more fields, representing other particles (e.g., the 
standard model), which would influence the 
behavior of the electromagnetic field by providing 
an effective ultraviolet cutoff high enough for not 
altering the predictions on the observations on the 
time and energy scales on which present (and, 
possibly, future over a long time span) experi- 
ments are performed. In dimension d — 3, QED is 
super-renormalizable, once the gauge symmetry is 
properly taken into account, and it can be studied 
with the techniques described above for the scalar 
fields in the corresponding dimension. 


In general, constructive quantum field theory 
seems to be in a deep crisis: the few solutions that 
have been found concern very special problems and 
are very demanding technically; the results obtained 
have often not been considered to contribute 
appreciably to any “progress.” And many consider 
that the work dedicated to the subject is not worth 
the results that one can even hope to obtain. 
Therefore, in recent years, attempts have been 
made to follow other paths: an attitude that in the 
past usually did not lead, in general to great 
achievements but that is always tempting and 
worth pursuing because the rare major progresses 
made in physics resulted precisely by such changes 
of attitude, leaving aside developments requiring 
work which was too technical and possibly hopeless: 
just to mention an important case, one can recall 
quantum mechanics which disposed of all attempts 
at understanding the observed atomic levels quanti- 
zation on the basis of refined developments of 
classical electromagnetism. 

For more details, the reader is referred to Nelson 
(1966), Guerra (1972), Glimm et al. (1973), Glimm 
and Jaffe (1981), Simon (1974), Benfatto et al. 
(1978, 2003), Aizenman (1982), Gawedzky and 
Kupiainen (1983, 1985a, b), Balaban (1983), and 
Giuliani and Mastropietro (2005). 


See also: Algebraic Approach to Quantum Field Theory; 
Axiomatic Quantum Field Theory; Euclidean Field 
Theory; Integrability and Quantum Field Theory; 
Perturbation Theory and its Techniques; Quantum Field 
Theory: A Brief Introduction; Scattering, Asymptotic 
Completeness and Bound States. 
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Introduction 


Contact geometry has been seen to underly many 
physical phenomena and is related to many other 
mathematical structures. Contact structures first 
appeared in the work of Sophus Lie on partial 
differential equations. They reappeared in Gibbs' 
work on thermodynamics, Huygens' work on 
geometric optics, and in Hamiltonian dynamics. 
More recently, contact structures have been seen to 
have relations with fluid mechanics, Riemannian 
geometry, and low-dimensional topology, and these 
structures provide an interesting class of subelliptic 
operators. 

After summarizing the basic definitions, exam- 
ples, and facts concerning contact geometry, this 
article discusses the connections between contact 
geometry and symplectic geometry, Riemannian 
geometry, complex geometry, analysis, and 
dynamics. The article ends by discussing two of 
the most-studied connections with physics: Hamil- 
tonian dynamics and geometric optics. References 
for other important topics in contact geometry 
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(e.g.， thermodynamics, fluid dynamics, holo- 
morphic curves, and open book decompositions) 
are provided in the *Further reading" section. 


Basic Definitions and Examples 


A hyperplane field £ on a manifold M is a codimen- 
sion-1 sub-bundle of the tangent bundle TM. Locally, 
a hyperplane field can always be described as the 
kernel of a 1-form. In other words, for every point in 
M there is a neighborhood U and a 1-form a defined 
on U such that the kernel of the linear map 
Ox : T4 M — R is & for all x in U. The form o is called 
a local defining form for £. A contact structure on a 
(2n + 1)-dimensional manifold M is a “maximally 
nonintegrable hyperplane field" £. The hyperplane 
field £ is maximally nonintegrable if for any (and hence 
every) locally defining 1-form a for € the following 
equation holds: 


a ^ (da)” #0 [1] 


(this means that the form is, pointwise, never equal 
to 0). Geometrically, the nonintegrability of £ means 
that no hypersurface in M can be tangent to £ along 
an open subset of the hypersurface. Intuitively, this 
means that the hyperplanes “twist too much" to be 
tangent to hypersurfaces (Figure 1). The pair (M, £) 
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Figure 1 The standard contact structure on R? given as the 
kernel of dz — y dx. Courtesy of Stephan Schónenberger. 


is called a contact manifold and any locally defining 
form a for £ is called a contact form for £. 


Example 1 The most basic example of a contact 
structure can be seen on R?"*! as the kernel of the 
1-form a — dz — 357. , yi dx; where the coordinates 
on R^"! are (x1,yi1,..., Xu, yn, z). This example is 
shown in Figure 1 when n= 1. | 


Example 2 Recall that on the cotangent space of 
any z-manifold M, there is a canonical 1-form A, 
called the Liouville form. If (q1,...,4,5) are local 
coordinates on M, then any 1-form can be expressed 
as S 4 pidan 50 (dis Dis... dn. Da) ate local coor- 
dinates on T" M. In these coordinates, 

和 A pin’ dqi [2] 

i=] 

where m:T*M—M is the natural projection 
map. The 1-jet space of M is the manifold 
J' (M) « T*M x R and can be considered as a bundle 
over M. The 1-jet space has a natural contact 
structure given as the kernel of œ= dz — à, where z 
is the coordinate on R. Note that if M — R" then we 
recover the previous example. 


Example 3 The (oriented) projectivized cotangent 
space of a manifold M is the set P*M of nonzero 
covectors in T*M where two covectors are identified 
if they differ by a positive real number, that is, 


P*M = (T"M \ {0})/Ry i3] 


where {0} is the zero section of T*M and JR, denotes 
the positive real numbers. If M has a metric then P*M 
can be easily identified with the space of unit 
covectors. Considering P* M as unit covectors, we can 
restrict the canonical 1-form A to P* M to get a 1-form 
o whose kernel defines a contact structure € on P* M. 
(Although there is no canonical contact form on P* M, 
the contact structure £ is still well defined.) Note that if 


M is compact then so is P* M; so this gives examples of 
contact structures on compact manifolds. 


If wand a’ are two locally defining 1-forms for £, then 
there is a nonzero function f such that a’ — fo. Thus, 
a!’ ^ (da’)" =f"*!a ^(da)" is a nonzero top dimen- 
sional form on M and if n is odd then the orientation 
defined by the local defining form is independent of the 
actual form. Hence, when 7 is odd, a contact structure 
defines an orientation on M (this is independent of 
whether or not £ is orientable!). If M had a preassigned 
orientation (and 7 is odd), then the contact structure is 
called *positive" if it induces the given orientation and 
"negative" otherwise. One should be careful when 
reading the literature, as some authors build 
positive into their definition of contact structure, 
especially when m= 1. If there is a globally defined 
1-form o whose kernel defines £, then £ is called 
transversally orientable or co-orientable. This is 
equivalent to the bundle € being orientable when n 
is odd or when z is even and M is orientable. In 
this article the discussion is restricted to transver- 
sely orientable contact structures. 

Suppose that a is a contact form for £, then eqn [1] 
implies that da|, is a symplectic form on £. This 
is one sense in which a contact structure is like an 
odd-dimensional analog of a symplectic structure. 

A submanifold L of a contact manifold (M,£) is 
called Legendrian if dim M —2 dim L + 1 and T,L C £p. 


Example 4 A fiber in the unit cotangent bundle 
with the contact structure from Example 3 is a 
Legendrian sphere. 


Example 5 Let f:M—R be a function. Then 
A(f)(q) — (gq, df, f(q)) is a section of the 1-jet space 
J'(M) of M; it is called the 1-jet of f. If s is any 
section of the 1-jet space, then it is Legendrian if and 
only if it is the 1-jet of a function. 


This observation is the basis for Lie's study of 
partial differential equations. More specifically, a 
first-order partial differential equation on M can be 
considered as giving an algebraic equation on /! (M). 
Then, a section of J'(M) satisfying this algebraic 
equation corresponds to the 1-jet of a solution to the 
original partial differential equation if and only if it 
is Legendrian. 

Recently, Legendrian submanifolds have been 
much studied. There are various classification results 
in three dimensions and several striking existence 
results in higher dimensions. 


Local Theory 


The natural equivalence between contact structures 
is contactomorphism. Two contact structures £0 and 


£1 on manifolds Mo and Mi, respectively, are 
contactomorphic if there is a diffeomorphism 
f : Mo — M: such that f.(£9) — &1. All contact struc- 
tures are locally contactomorphic. In particular, we 
have the following theorem. 


Theorem 1 (Darboux's Theorem). Suppose £; is a 
contact structure on the manifold M;,i=0,1, and 
Mo and M; have the same dimension. Given any 
points po and pı in Mo and Mi, respectively, there 
are neighborhoods Ni of p; in Mj and a contacto- 
morphism from (No,£o|w,) to (N1, &£i|w,). Moreover, 
if o; is a contact form for & near pi, then the 
contactomorphism can be chosen to pull a, back to ao. 


Thus, locally all contact structures (and contact 
forms!) look like the one given in Example 1 above. 

Furthermore, contact structures are “local in 
time." That is, compact deformations of contact 
structures do not produce new contact structures. 


Theorem 2 (Gray's theorem). Let M be an oriented 
(2n + 1)-dimensional manifold and £,,t € (0,1), a 
family of contact structures on M tbat agree off of 
some compact subset of M. Then there is a family of 
diffeomorphisms à, : M — M such that (;),& = £p. 


In particular, on a compact manifold, all 
deformations of contact structures come from 
diffeomorphisms of the underlying manifold. The 
theorem is not true if the contact structures do not 
agree off of a compact set. For example, there is a 
one-parameter family of  noncontactomorphic 
contact structures on S! x R?. 


Existence and Classification 


The existence of contact structures on closed odd- 
dimensional manifolds is quite difficult. However, 
Gromov has shown that contact structures on 
open manifolds obey an h-principle. To explain 
this, we note that if (M?"*!,£) is a co-oriented 
contact manifold then the tangent bundle of M can 
be written as £ 6 R and thus the structure group 
of TM can be reduced to U(m) (since € has 
a conformal symplectic structure on it). Such 
a reduction of the structure group is called an 
almost contact structure on M. Clearly, a contact 
structure on M induces an almost contact struc- 
ture. If M is an open manifold, Gromov proved 
that the inclusion of the space of co-oriented 
contact structures on M into the space of almost 
contact structures on M is a weak homotopy 
equivalence. In particular, if an open manifold 
meets the necessary algebraic condition for the 
existence of an almost contact structure, then the 
manifold has a co-oriented contact structure. 
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Lutz and Martinet proved a similar, but weaker, 
result for oriented closed 3-manifolds. More 
specifically, every closed oriented 3-manifold admits 
a co-oriented contact structure and in fact has at least 
one for every homotopy class of plane field. There has 
been much progress on classifying contact structures 
on 3-manifolds and here an interesting dichotomy has 
appeared. Contact structures break into one of two 
types: tight or overtwisted. Overtwisted contact 
structures obey an h-principle and are in general easy 
to understand. Tight contact structures have a more 
subtle, geometric nature. In higher dimensions there is 
much less known about the existence (or classification) 
of contact structures. 


Relations with Symplectic Geometry 


Let (X,w) be a symplectic manifold. A vector field v 
satisfying 


Liw = w [4] 


(where L,w is the Lie derivative of w in the direction 
of v) is called a symplectic dilation. A compact 
hypersurface M in (X,w) is said to have “contact 
type” if there exists a symplectic dilation v in a 
neighborhood of M that is transverse to M. Given a 
hypersurface M in (X,w), the characteristic line field 
LM in the tangent bundle of M is the symplectic 
complement of TM in TX. (Since M is codimension 1, 
it is coisotropic; thus, the symplectic complement lies 
in TM and is one dimensional.) 


Theorem 3 Let M be a compact hypersurface in a 
symplectic manifold (X,w) and denote the inclusion 
map 1: M — X. Then M has contact type if and only 
if there exists a 1-form a on M such that da=i*w 
and the form a is never zero on the characteristic 


line field. 


If M is a hypersurface of contact type, then the 
1-form a is obtained by contracting the symplectic 
dilation v into the symplectic form: a= w. It is 
easy to verify that the 1-form a is a contact form 
on M. Thus, a hypersurface of contact type in a 
symplectic manifold inherits a co-oriented contact 
structure. 

Given a co-orientable contact manifold (M, £), its 
symplectization Symp(M, €)=(X,w) is constructed 
as follows. The manifold X — M x (0,00), and given 
a global contact form a for € the symplectic 
form is w=d(ta), where £ is the coordinate on R. 
(The symplectization is also equivalently defined as 


(M x R,d(e'a)).) 


Example 6 The symplectization of the standard 
contact structure on the unit cotangent bundle 
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(see Example 3) is the standard symplectic structure 
on the complement of the zero section in the 
cotangent bundle. 


The symplectization is independent of the choice 
of contact from a. To see this, fix a co-orientation 
for € and note the manifold X which can be 
identified (in many ways) with the sub-bundle of 
T*M whose fiber over x € M is 


(8 € T;M : B(£x) =0 and 
B > 0 on vectors positively transverse to £,) [5] 


and restricting dA to this subspace yields a symplec- 
tic form w, where A is the Liouville form on T*M 
defined in Example 2.A choice of contact form a 
fixes an identification of X with the sub-bundle of 
T*M under which d(to) is taken to dA. 

The vector field v = 0/0t on (X,w) is a symplectic 
dilation that is transverse to M x {1} C X. Clearly, 
Low|MxI =a. Thus, we see that any co-orientable 
contact manifold can be realized as a hypersurface 
of contact type in a symplectic manifold. In 
summary, we have the following theorem. 


Theorem 4 If (M,£) is a co-oriented contact 
manifold, tben tbere is a symplectic manifold 
Symp(M,£) in which M sits as a hypersurface of 
contact type. Moreover, any contact form a for € 
gives an embedding of M into Symp(M,£) that 
realizes M as a bypersurface of contact type. 


We also note that all the hypersurfaces of contact 
type in (X,w) look locally, in X, like a contact 
manifold sitting inside its symplectification. 


Theorem 5 Given a compact hypersurface M of 
contact type in a symplectic manifold (X,w) with the 
symplectic dilation given by v, there is a neighbor- 
hood of M in X symplectomorphic to a neighbor- 
hood of Mx {1} im Symp(M,£) where tbe 
symplectization is identified with M x (0,00) using 
the contact form a=1t,w|y and £ — ker a. 


The Reeb Vector Field and Riemannian 
Geometry 


Let (M,£) be a contact manifold. Associated to a 
contact form a for € is the Reeb vector field va. 
This is the unique vector field satisfying 


ty dà 0 [3 


One may readily check that v, is transverse to the 
contact hyperplanes and the flow of v, preserves £ 
(in fact, it preserves a). These two conditions 
characterize Reeb vector fields; that is, a vector 
field v is the Reeb vector field for some contact form 


to 1. and 


for € if and only if it is transverse to € and its flow 
preserves £. 

The fundamental question concerning Reeb vector 
fields asks if its flow has a (contractible) periodic 
orbit. A paraphrazing of the Weinstein conjecture 
asserts a positive answer to this question. Most 
progress on this conjecture has been made in 
dimension 3 where H Hofer has proved the 
existence of periodic orbits for all Reeb fields on S? 
and on  3-manifolds with essential spheres 
(i.e., embedded S?’s that do not bound a 3-ball in 
the manifold). Relations with Hamiltonian dynamics 
are discussed below. 

Recall, from Example 3, that a Riemannian metric 
g on a manifold M provides an identification of the 
(oriented) projectivized cotangent bundle P*M with 
the unit cotangent bundle. Considered as a subset of 
T* M, P* M inherits not only a contact structure but 
also a contact form a (by restricting the Liouville 
form). Let v, be the associated Reeb vector field. 
The metric g also provides an identification of the 
tangent and cotangent bundles of M. Thus, P*M 
may be considered as the unit tangent bundle of M. 
Let wy, be the vector field on the unit tangent bundle 
generating the geodesic flow on M. 


Theorem 6 The Reeb vector field va is identified 
with geodesic flow field wọ when P*M is identified 
with the unit tangent space using the metric g. 


Relations with Complex Geometry 
and Analysis 


Let X be a complex manifold with boundary and 
denote the induced complex structure on TX by J. 
The complex tangencies £ to M — ÓX are described 
by the equation dóo/ —0, where ó is a function 
defined in a neighborhood of the boundary such that 
0 is a regular value and $^ (0)— M. The form 
L(v,w)= —d(ddoJ)(v,Jw), for v,w €£, is called 
the Levi form, and when L(v,w) is positive 
(negative) definite, then X is said to have strictly 
pseudoconvex  (pseudoconcave) boundary. The 
hyperplane field £ will be a contact structure if and 
only if d(dó o J) is a nondegenerate 2-form on € (if 
and only if L(v, w) is definite). A well-studied source 
of examples comes from Stein manifolds. 


Example 7 Let X be a complex manifold and 
again let / denote the induced complex structure 
on TX. From a function ó : X — R, we can define a 
2-form w= —d(dóo/) and a symmetric form 
g(v, w) — w(v, Jw). If this symmetric form is positive 
definite, the function ó is called “strictly plurisub- 
harmonic." The manifold X is a Stein manifold if X 


admits a proper strictly plurisubharmonic function 
@:X — R. An important result says that X is Stein 
if and only if it can be realized as a closed complex 
submanifold of C". Clearly any noncritical level set 
of @ gives a contact manifold. 


Contact manifolds also give rise to an interesting 
class of differential operators. Specifically, a contact 
structure £ on M defines a symbol-filtered algebra of 
pseudodifferential operators ;(M), called the 
*Heisenberg calculus." Operators in this algebra 
are modeled on smooth families of convolution 
operators on the Heisenberg group. An important 
class of operators of this type are the “sum-of- 
squares" operators. Locally, the highest-order part 
of such an operator takes the form 


2n 
L= » Vi + idvq [7] 
j=1 


where {v1,..., V2} is a local framing for the contact 
field and va is a Reeb vector field. This operator 
belongs to Ve(M) and is subelliptic for a outside a 
discrete set. 


Hamiltonian Dynamics 


Given a symplectic manifold (X,w), a function 
H:X—R will be called a Hamiltonian. (Only 
autonomous Hamiltonians are discussed here.) The 
unique vector field satisfying 


by = —dH 


is called the Hamiltonian vector field associated to 
H. Many problems in classical mechanics can be 
formulated in terms of studying the flow of vy for 
various H. 


Example 8 If (X,w)=(R*",dA), where A is from 
Example 2, then the flow of the Hamiltonian vector 
field is given by 


A standard fact says that the flow of vy preserves 
the level sets of H. 


Theorem 7 If M is a level set of H corresponding 
to a regular value and M is a hypersurface of contact 
type, then the trajectories of vy and of the Reeb 
vector field (associated to M in Theorem 3) agree. 


Thus under suitable hypothesis, Hamiltonian 
dynamics is a reparametrization of Reeb dynamics. 
In particular, searching for periodic orbits in such a 
Hamiltonian system is equivalent to searching for 
periodic orbits in a Reeb flow. Thus in this context, 
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Weinstein’s conjecture asserts a positive answer to 
the questions: Does the Hamiltonian flow along a 
regular level set of contact type have a periodic 
orbit? Viterbo proved that the answer was yes if the 
hypersurface is compact and in (R*”,w=da). Other 
progress has been made by studying Reeb dynamics. 


Geometric Optics 


In this section, we study the propagation of light (or 
various other disturbances) in a medium (for the 
moment, we do not specify the properties of this 
medium). The medium will be given by a three- 
dimensional manifold M. Given a point p in M and 
t > 0, let I,(t) be the set of all points to which light 
can travel in time <t. The wave front of p at time t 
is the boundary of this set and is denoted as 
$, (t) — O01,(t). 


Theorem 8 (Huygens! principle). ®p(t+ t) is the 
envelope of the wave fronts ®,(t') for all q € e(t). 


This is best understood in terms of contact 
geometry. Let a: (T*MX(0]) > P*M be the natural 
projection (see Example 3) and let S be any smooth 
sub-bundle of T* MM {0} that is transverse to the radial 
vector field in each fiber and for which v |; : $ —^ P*M 
is a diffeomorphism. The restriction of the Liouville 
form to S gives a contact form a and a corresponding 
Reeb vector field v. Given a subset F of M with a well- 
defined tangent space at every point set 


Lg — (p €S:m(p) €F and p(w) =0 for all 
WE Tr F} [8] 


The set Lp is a Legendrian submanifold of S and is 
called the “Legendrian lift” of F. If L is a generic 
Legendrian submanifold in S, then z(L) is called the 
front projection of L and Ly) = L. Given a Legendrian 
submanifold L, let V,(L) be the Legendrian submani- 
fold obtained from L by flowing along v for time t. 


Example 9 Given a metric g on M, Fermat's 
principle says that light travels along geodesics. 
Thus, if $ is the unit cotangent bundle, then using g 
to identify the geodesic flow with the Reeb flow 
one sees that light will travel along trajectories 
of the Reeb vector field. Given a point p in M, 
the Legendrian submanifold Ly, is a sphere sitting 
in T;M. The Huygens principle follows from the 
observation that ®,(t) — z(W;(L;)). 


Using the more general $ discussed above, one can 
generalize this example to light traveling in a medium 
that is nonhomogeneous (i.e., the speed differs from 
point to point in M) and anisotropic (i.e., the speed 
differs depending on the direction of travel). 
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See also: Hamiltonian Fluid Dynamics; Integrable Systems 
and Recursion Operators on Symplectic and Jacobi 
Manifolds; Minimax Principle in the Calculus of Variations. 
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Introduction 


Control Theory is an interdisciplinary research area, 
bridging mathematics and engineering, dealing with 
physical systems which can be “controlled,” that is, 
whose evolution can be influenced by some external 
agent. A general model can be written as 


y(t) = A(t, (0), u(-)) [1] 


where y describes the state variables, y(0) the initial 
condition, and u(-) the control function. Thus, eqn 
[1] means that the state at time t depends on the 
initial condition but also on some parameters u 
which can be chosen as function of time. To be 
precise, there are some control problems which are 
not of evolutionary type; however, in this presenta- 
tion we restrict ourselves to this case. 

One has to distinguish among the control set U where 
the control function can take values: u(t) € U, and the 
space of control functions, U, to which each control 
function should belong: u(-) € U. Thus, for example, 
we may have U = R” and U = L*([0, T], R”). 


There are various problems one can formulate 
regarding systems of type [1], among which: 


Controllability Given any two states yo and yi 
determine a control function u(-) such that for 
some time t > 0 we have yı = A(t, yo, u(-)). 

Optimal control Consider a cost function J(y(-), 
u(-)) depending both on the evolutions of y and u 
and determine a control function z(-) and a 
trajectory Y(t)= A(t, yo, 4(-)) such that y(-) steers 
the system from yo to yı, as before, and the cost J 
is minimized (or maximized). 

Stabilization We say that y is an equilibrium if 
there exists 4 € U such that A(t, y, ij) — y for every 
t > 0 (here z indicates also the constant in time 
control function). Determine the control wu as 
function of the state y so that y is a (Lyapunov) 
stable equilibrium for the uncontrolled dynamical 
system y(t) = A(t, y(0), u(y(-))). 

Observability Assume that we can observe not the 
state y, but a function ó(y) of the state. Determine 
conditions on ó so that the state y can be 
reconstructed from the evolution of ó(y) choosing 
u(-) suitably. 


For the sake of simplicity, we restrict ourselves 
mainly to the first two problems and just mention 


some facts about the others. Also, we focus on two 
cases: 


Control of ordinary differential equations (ODEs) In 
this case t€ R,y € R”, U is a set, typically 
U C R”, and A is determined by a controlled ODE 


ý = f(t,y,u) [2] 


A typical example in mathematical physics is the 
control of mechanical systems (Bloch 2003, Bullo 
and Lewis 2005). 


Control of partial differential equations (PDEs) In 
this case t € R,x € R",y(x) belongs to a Banach 
functional space, for example, H*(R"*!, R), U isa 
functional space, and A is determined by a 
controlled PDE, 


FE y yeu -s Dos Mess Mem O [3] 


A typical example in mathematical physics is the 
control of wave equation using boundary condi- 
tions, see below. 


There are various other possible situations we do 
not treat here: *stochastic control," when y is a random 
variable and A defined by a (controlled) sto- 
chastic differential equation; “discrete time control," 
where t € N; “hybrid control,” where ? and y may have 
both discrete and continuous components, and so on. 

As shown above, the control law can be assigned 
in (at least) two basically different ways. In open- 
loop form, as a function of time: t — u(t), and in 
closed-loop form or feedback, as a function of the 
state: y 一 u(y). For example, in optimal control we 
look for a control u(t) in open-loop form, while in 
stabilization we search for a feedback control u(y). 
The open-loop control depends on y(0), while a 
feedback control can stabilize regardless of the 
initial condition. 


Example 1 A point with unit mass moves along a 
straight line; if a controller is able to apply an 
external force u, then, calling yi(£), y2(t), respec- 
tively, the position and the velocity of the point at 
time £, the motion is described by the control system 


(1.92) = (y2, 4) [4] 
It is easy to check that the feedback control 
u(y1,y2)= —y1 — y2 stabilizes the system asymptot- 


ically to the origin, that is, for every initial data 
(y1, y2), the solution of the corresponding Cauchy 
problem satisfies lim; — ə (y1, y2)(t) = (0,0). 

Another simple problem consists in driving the 
point to the origin with zero velocity in minimum 
time from given initial data. It is quite easy to see 
that the optimal strategy is to accelerate towards the 
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Figure 1 Example 1. The simplest example of (a) optimal 
synthesis and (b) corresponding feedback. 


origin with maximum force on some interval [0, 7] 
and then to decelerate with maximum force to reach 
the origin at velocity zero. The set of optimal 
trajectories is depicted in Figure 1a: they can 
be obtained using the following discontinuous 
feedback, see Figure 1b. Define the curves 
C*={(y1,y2): Fy2 > 0,y1— +y3} and let ¢ be 
defined as the union C^ U {0}. We define A* to be 
the region below ¢ and A^ the one above. Then the 
feedback is given by 


+1 if (y1,92) Ee AT UCT 
u(x)= 4-1 if (1,y2) EA UC 
0 if (y1, y2) = (0,0) 


Example 2 Consider a (one-dimensional) vibrating 
string of unitary length with a fixed endpoint. The 
model for the motion of the displacement of the 
string with respect to the rest position is given by 


yi + Ay = 0, y(t,0) =0 [5] 
with initial data 
y(0,-)=yo, | x(0.)2» [6] 


Assume that we can control the position of the 
second endpoint; then, 


y(t, 1) = u(t) [7] 
for some control function u(-) € R. 


Let us introduce another key concept: the reach- 
able set at time ż from y is the set 


R(t:y) = (A(t y, u(-)): u() EU} 


Various problems can be formulated in terms of 
reachable sets, for example, controllability requires 
that for every y the union of all R(t;y) as t — oo 
includes the entire space. The dependence of R(t; y) 
on time ¢ and on the set of controls U is also a 
subject of investigation: one may ask whether the 
same points in R(t;y) can be reached by using 
controls which are piecewise constant, or take 
values within some subsets of U. 
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Control of ODEs 


For most proofs we refer to Agrachev and Sachkov 
(2004) and Sontag (1998). 


Controllability 
Consider first the case of a linear system: 
u € U, y(0) = yo [8] 


where y, yo € R”, UC R”, A is an n x n matrix and 
B an n x m matrix. We have the following property 
of reachable sets: 


y = Ay + Bu, 


Theorem 1 If U is compact convex then the 
reachable set R(t) for [8] is compact and convex. 


A control system [8] is controllable if taking 
U=R™ we have R(t) - R" for every t>0. By 
linearity, this is equivalent to requiring the reachable 
set to be a neighborhood of the origin in case of 
bounded controls. Define the controllability matrix 
to be the n x mm matrix 


C(A, B) = (B, AB,..., A"! B) 


Controllability is characterized by the following: 


Theorem 2 (Kalman controllability theorem). The 
linear system [8] is controllable if and only if 
rank(C(A, B)) =n. 


For linear systems, there exists a duality between 
controllability and observability in the sense of the 
following theorem: 


Theorem 3 Consider tbe linear control system |8] 
and assume to observe the variable z(y)— Cy for 
some p x n matrix C. Then, observability holds if 
and only if tbe linear system y=A'y+C'v is 
controllable. 


There exists no characterization of controllability 
for nonlinear systems as for linear ones, but we have 
the linearization result: 


Theorem 4 A zonlinear system is locally control- 
lable if its linearization is. The converse is false. 


There are many results for the important class of 
control-affine systems 


m 
y = foly) + 》 fiui [9] 

i=l 
where fo,..., fm are smooth vector fields on R” and 
U = R” [n general, there exists no explicit represen- 
tation for the trajectories of [9], in terms of integrals 
of the control as it happens for linear systems. Still, a 
rich mathematical theory has been developed apply- 
ing techniques and ideas from differential geometry: 


the so-called geometric control theory. The main idea 
is that controllability (and properties of optimal 
trajectories) is determined by the Lie algebra gener- 
ated by vector fields fi. For example: 


Theorem 5 (Lie-algebraic rank condition). Let £ 
be tbe Lie algebra generated by tbe vector fields 
f;4i—1,...,m, and assume fo=0. If L(y) is of 
dimension n at every point y tben tbe system is 
controllable. 


We refer to Agrachev and Sachkov (2004) 
and Jurdjevic (1997) for general presentation of 
geometric control theory and give a simple example 
to show how Lie brackets characterize reachable 
directions. 


Example 3 Consider the Brockett integrator 


yy — M1; Vo = U2, ya = U1Y2 — u2y1 


Starting from the origin, using constant controls, we 
can move along curves tangent to the y;y» plane. 
However, let f; = (1,0, y2) and f; = (0, 1, — y1) (fields 
corresponding to constant controls); then their Lie 
bracket is xy by 


fi, f2](0) = (Df2- fi — Dfa- f2)(0) = (0,0, -2) 


Moving ^n time + first along the integral curve of fi, 
then of fo, then of —f;, and finally of —f:, we reach 
a point 7?^[fi, f2](0) + o(t7) along the vertical direc- 
tion y3. This corresponds to say that the system 
satisfies LARC. 


Optimal Control 


The theory of optimal control has developed in three 
main directions: 

Existence of optimal controls, under various 
assumptions on L,f,U. When the sets F(t,y) are 
convex, optimal solutions can be constructed follow- 
ing the direct method of Tonelli for the calculus of 
variations, that is, as limits of minimizing sequences: 
the two main ingredients are compactness and lower- 
semicontinuity. If convexity does not hold, existence 
is not granted in general but for special cases. 

Necessary conditions for the optimality of a 
control u(-). The major result in this direction is 
the celebrated “Pontryagin maximum principle” 
(PMP) which extends the Euler-Lagrange equation 
to control systems, and the Weierstrass necessary 
conditions for a strong local minimum in the 
calculus of variations. Various extensions and other 
necessary conditions are now available (Agrachev 
and Sachkov 2004). 

Sufficient conditions for optimality. The standard 
procedure resorts to embedding the optimal control 
problem in a family of problems, obtained by 


varying the initial conditions. One defines the value 
function V by 


V(t, y) = int J(y(-), (:) 


where the inf is taken over the set of trajectories and 
controls satisfying y(t) = y. Under suitable assumptions, 
V is the solution to a first-order Hamilton—Jacobian 
PDE. The lack of regularity of the value function V has 
long provided a major obstacle to a rigorous mathema- 
tical analysis, solved by the theory of viscosity solutions 
(Bardi and Capuzzo Dolcetta 1997). Another method 
consists in building an optimal synthesis, that is, a 
collection of trajectory-control pairs. 


Pontryagin maximum principle Consider a general 
autonomous control system: 


y "s f Cy, u) [10] 
where y € R” and u € U compact subset of R”. We 
assume to have regularity of f guaranteeing existence 
and uniqueness of trajectories for every u(-) € U. For 
a fixed T > 0, an optimal control problem in Mayer 
form is given by 


min v(y(T, u)), 


min y(0) ^ y [11] 


where w is the final cost and y the initial condition. 
More generally, one can consider also the Lagran- 
gian cost f L(y,u)dt and reduce to this case by 
adding a variable yo(0) — 0 and yo = L. 

The well-known PMP provides, under suitable 
assumptions, a necessary condition for optimality in 
terms of a lift of the candidate optimal trajectory to 
the cotangent bundle. For problems as [11], PMP 
can be stated as follows: 


Theorem 6 Let u*(-) be a (bounded) admissible 
control whose corresponding trajectory y' (-) 2 y(:,w*) 
is optimal. Call p:[0, T] — R” the solution of the 
adjoint linear equation 
p(t) = —p(t) - Dyf (y' (£),w (t)) 
p(T) = Vv(y'(T)) 
Then the maximality condition 


p(t) - f(y" (t), u” (t)) = max p(t) -f(y (tw) [13] 


bolds for almost every time t € [0, T]. 


[12] 


Notice that the conclusion of the theorem can be 
interpreted by saying that the pair (y, p) satisfies the 
system: 


I Ui CAE iai bus oe Pt) 
Op | Oy 


where H(y,p,u)=(p,f(y,u)). This is a pseudo- 
Hamiltonian system, since H also depends on *. 
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Alternatively, one can define the maximized 


Hamiltonian 
H(y, p) = max(p, f (y, w)) 


but H may fail to be smooth. Another difficulty lies 
in the fact that an initial condition is given for y and 
a final condition is given for X. 

The proof of PMP relies on a special type of 
variations, called needle variations, of a reference 
trajectory. Given a candidate optimal control u* and 
corresponding trajectory y*, a time 7 of approximate 
continuity for f(y*(.),4u*(-)) and w€ U, a needle 
variation is a family of controls ue obtained 
by replacing 4^ with w on the interval [T — &, 7]. 
A needle variation gives rise to a variation v of the 
trajectory satisfying the variational equation 


v(t) = Dyf (y' (t), w' (t)) - v(t) [14] 


in classical sense only after time 7. Recently Piccoli 
and Sussmann (2000) introduced a setting in which 
needle and other variations happen to be 
differentiable. 

One may also consider some final (or initial) 
constraint: 


(T. (T) es [15] 


where $ c R x R” (and T not fixed). In this case, the 
final condition for p is more complicated as well as 
the proof of PMP. It is interesting to note the many 
connections between PMP and classical mechanics 
framework well illustrated by Bloch (2003) and 
Jurdjevic (1997). 


Value function and HJB equation In this section 
we consider the minimization problem 


inf Y(T, y(T,) ID 


for the control system 
y=f(t,y,u), w(t)cU a.e. [17] 


subject to the terminal constraints [15], where 
S c R"*! is a closed target set. 


Theorem 7 (PDE of dynamic programming). 
Assume that the value function V, for [15]-[17], 
is C! on some open set Q C R x R”, not intersecting 
the target set S. Then V satisfies the Hamilton- 
Jacobi equation 


Vs(s y) + min V,(s y) - f(s, y.) = 0 - 
V(s,y)eQ 


Equation [18] is called the Hamilton-Jacobi-Bellman 
(HJB) equation, after Richard Bellman. In general, 
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however, V fails to be differentiable: this is the case for 
Example 1 along the lines C^. To isolate V as the 
unique solution of the HJB equation, one has to resort 
to the concept of viscosity solution. The dynamic 
programming and HJB equation apparatus applies 
also to stochastic problems for which the equation 
happens to be parabolic, because of the Ito formula. 


Optimal syntheses Roughly speaking, an optimal 
synthesis is a collection of optimal trajectories, one 
for each initial condition y. Geometric techniques 
provide a systematic method to construct syntheses: 


Step 1 Study the properties of optimal trajectories 
via PMP and other necessary conditions. 

Step 2 Determine a (finite-dimensional) sufficient 
family for optimality, that is, a class of trajectories 
(satisfying PMP) containing all possible optimal ones. 

Step 3 Construct a synthesis selecting one trajec- 
tory for every initial condition in such a way as to 
cover the state space in a regular fashion. 

Step 4 Prove that the synthesis of Step 3 is indeed 
optimal. 


One of the main problems in step 2 is the possible 
presence of optimal controls with an infinite number 
of discontinuities, known as Fuller phenomenon. The 
key concept of regular synthesis, of step 3, was 
introduced by Boltianskii and recently refined by 
Piccoli and Sussmann (2000) to include Fuller phe- 
nomena. The above strategy works only in some 
special cases, for example for two-dimensional 
minimum-time problems (Boscain and Piccoli 2004): 
we report below an example. 


Example 4 Consider the problem of orienting in 
minimum time a satellite with two orthogonal rotors: 
the speed of one rotor is controlled, while the second 
rotor has constant speed. This problem is modelled by 
a left-invariant control system on SO(3): 


y 2y(F-- 4G), y € SO(3), |u| € 1 


where F and G are two matrices of so(3), the Lie 
algebra of SO(3). Using the isomorphism of Lie 
algebras (SO(3), [., .]) ~ (R?, x), the condition that 
the rotors are orthogonal reads: trace(F - G) — 0. 
If we are interested to orient only a fixed semi-axis 
then we project the system on the sphere $°: 


y=y(F+uG), ye S, lu| <1 


In this case, F 4- G and F — G are rotations around 
two fixed axes and, if the angle between these two 
axes is less than 7/2, every optimal trajectory is a 
finite concatenation of arcs corresponding to con- 
stant control +1 or —1. The “optimal synthesis" can 
be obtained by the feedback shown in Figure 2. 


Figure 2 Optimal feedback for Example 4. 


Control of PDEs 


The theory for control of models governed by PDEs 
is, as expected, much more ramified and much less 
complete. An exhaustive resume of the available 
results is not possible in short space, thus we focus 
on Example 2 and few others to illustrate some 
techniques to treat control problems and give 
various references (see also Fursikov and Imanuvilov 
(1996), Komornik (1994), and Lasiecka and Triggiani 
(2000), and references therein). 

Besides the variety of control problems illustrated 
in the Introduction, for PDE models one can consider 
different ways of applying the control, for example: 

Boundary control One consider the system [3] 
(with F independent of 4) and impose the condition 
y(t, x) — u(t,x) to hold for every time £ and every x in 
some region. Usually, we assume y(t) to be defined 
bounded region €) and the control acts on some set 
Lc OQ. Obviously, also Neumann conditions are 
natural as Ó,y =u where v is the exterior normal to 2. 

Internal control One consider the system [3] 
with F depending on u. Thus, the control acts on the 
equation directly. 

Other controls There are various other control 
problems one may consider as  Galerkin-type 
approximation and control of some finite family of 
modes. An interesting example is given by Coron 
(2002), where the position of a tank is controlled to 
regulate the water level inside. 


Control of a Vibrating String 


We consider Example 2, but various results hold for 
hyperbolic linear systems in general. First consider 
the uncontrolled system 


2(0,t) = z(1,t) =0 [19] 


A first integral is the energy given by 


E(t) =5 | [iP la] a 


Ztt = Az, 


Then we say that the system [19] is observable at 
time T if there exists C(T) such that 


T 
E(0) < C(T) | le. (1, Pd 


which means that if we observe zero displacement 
on the right end for time T then the solution has 
zero energy and hence vanishes. In this case, the 
system is observable for every time T > 2: this is 
precisely the time taken by a wave to travel from the 
right end point to the left one and backward. 

Thanks to a duality as for the finite-dimensional 
case, observability of [19] is equivalent to null 
controllability for [5]-[7], that is, to the property 
that for every initial conditions yo, y; there exists a 
control z(:) such that the corresponding solution 
verifies y(x, T) — y;,(x, T) 20. More precisely, the 
desired control is given by u(t) —z,(1, t), where z is 
the solution of [19] minimizing the functional (over 
L? x H”) 


J(2(:,0),2,(,0)) 
T 
-5 f Isa. of ace f you ,0)dx— f yr2(-0)dx 


One can check that this functional is continuous and 
convex, and the coercivity is granted by the 
observability of [19]; thus, a minimum exists by 
the direct method of Tonelli. This is an example of 
the method known as Hilbert's uniqueness method 
introduced by Lions (1988). 

In the multidimensional case, controllability can 
be characterized by imposing a condition on the 
region T C 9Q on which the control acts. More 
precisely, rays of geometric optics in Q should 
intersect I’ (Zuazua 2005). 

If we consider infinite-time horizon T= +œ and 
introduce the functional 


+00 
j=] [v de-N f 2 de dx 
0 


then the optimal control is determined as follows. 
If (y,p) is a solution of the optimality system: 
[5]-[6] with y=0 outside I and 
Du —Ap+y=0, )p+Ny=0 onT 
p=0 on OU 


then 4 — y on T (Lions 1988, Zuazua 2005). 


Controllability via Return Method of Coron 


As we saw in Theorem 4, a nonlinear system may be 
controllable even if its linearization is not. In this 
case, controllability can be proved by the return 
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method of Coron, which consists in finding a 
trajectory y such that the following hold: 


1. y(0) 2 y(T) =0; 
2. the linearized system around y is controllable. 


Then by implicit-function theorem, local controll- 
ability is granted, that is, there exits ¢ > 0 such that 
for every data yo, yı of norm less than e, there exists 
a control steering the system from yo to y; in time T. 
This method does not give many advantages in the 
finite-dimensional case, but permits to obtain excel- 
lent results for PDE systems such as Euler, Navier- 
Stokes, Saint- Venant, and others (Coron 2002). 


Control of Schródinger Equation 


Consider the issue of designing an efficient transfer of 
population between different atomic or molecular 
levels using laser pulses. The mathematical descrip- 
tion consists in controlling the Schródinger equation. 
Many results are available in the finite-dimensional 
case. Finite-dimensional closed quantum systems are 
in fact left-invariant control systems on SU(n), or on 
the corresponding Hilbert sphere S?"^! c C", where 
n is the number of atomic or molecular levels, and 
powerful techniques of geometric control are avail- 
able both for what concerns controllability and 
optimal control (Agrachev and Sachkov 2004, 
Boscain and Piccoli 2004, Jurdjevic 1997). 

Recent papers consider the minimum-time pro- 
blem with unbounded controls as well as minimiza- 
tion of the energy of transition. Boscain et al. (2002) 
have applied the techniques of sub-Riemannian geo- 
metry on Lie groups and of optimal synthesis on two- 
dimensional manifolds to the population transfer 
problem in a three-level quantum system driven by 
two external fields of arbitrary shape and frequency. 

Although many results are available for finite- 
dimensional systems, only few controllability prop- 
erties have been proved for the Schródinger equation 
as a PDE, and in particular no satisfactory global 
controllability results are available at the moment. 
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Introduction 


Convexity is an important notion in nonlinear 
optimization theory as well as in infinite- 
dimensional functional analysis. As will be seen 
below, very simple and powerful tools will be 
derived from elementary duality arguments (which 
are by-products of the Moreau-Fenchel transform 
and Hahn-Banach theorem). We will emphasize on 
applications to a large range of variational pro- 
blems. Some arguments of measure theory will be 
skipped. 


Basic Convex Analysis 


In the following, we denote by X a normed vector 
space, and by X* the topological dual of X. If 
a topology different from the normed topology is 
used on X, we will denote it by 7. For every x € X 
and A C X, Vx denotes the open neighborhoods of x 
and int A, cl A, respectively, the interior and the 
closure of A. We deal with extended real-valued 
functions f: X — R U {+00}. We denote by dom f = 
f^(R) and by epif={(x,a) € X x R:f(x) € a) 
the domain and the epigraph of f, respectively. We 
say that f is proper if dom f #0. Recall that f is 
convex if for every (x,y) € X* and t € [0, 1], there 
holds 


f (tx + (1 — t)y) € tf(x) + (1 — Of (y) 


(by convention oo + a = +00) 


The notion of convexity for a subset AC X 


is recovered by saying that x4 is convex, where its 
indicator function x4 is defined by setting 


"TTE 0 ifxcA 
j +oo otherwise 


Continuity and Lower-Semicontinuity 


A first consequence of the convexity is the continuity 
on the topological interior of the domain. We refer for 
instance to Borwein and Lewis (2000) for a proof of 


Theorem 1 Let f:X— RU (--oo] be convex and 
proper. Assume that supyf < +00, where U is a 
suitable open subset of X. Then f is continuous and 
locally Lipschitzian on all int(dom f). 


As an immediate corollary, a convex function on 
a normed space is continuous provided it is 
majorized by a locally bounded function. In the 
finite-dimensional case, it is easily deduced that a 
finite-valued convex function f: R^—R is locally 
Lipschitz. Furthermore, by Aleksandrov's theorem, 
f is almost everywhere twice differentiable and the 
non-negative Hessian matrix V?f coincides with the 
absolutely continuous part of the distributional 
Hessian matrix D?f (it is a Radon measure taking 
values in the non-negative symmetric matrices). 

However, in infinite-dimensional spaces, for 
ensuring compactness properties (as, e.g., in condi- 
tion (ii) of Theorem 4 below), we need to use weak 
topologies and the situation is not so simple. 
A major idea consists in substituting the continuity 
property with lower-semicontinuity. 


Definition 2 A function f : X + R U {+00} is 7-Ls.c. 
at xo € X if for all o € R, there exists U € Y,, 
such that f > a on U. In particular, f will be l.s.c. on 
all X provided f ((r, +oo)) is open for every r € R. 


Remark 3 


(i) The following sequential notion can be also 
used: f is 7-sequentially l.s.c. at xo if 


V(%n) C X x, — xo = liminf f(x») > f(xo) 


It turns out that this notion (weaker in general) 
is equivalent to the previous one provided xo 
admits a countable basis of neighborhoods. 

(ii) A well-known consequence of Hahn-Banach 
theorem is that, for convex functions, the lower- 
semicontinuity property with respect to the 
normed topology of X is equivalent to the weak 
(or weak sequential) lower-semicontinuity. 


Theorem 4 (Existence). Let f:X — RU(-oo] be 


proper, such that 


(i) f is rist, 
(ii) Vr € R, f! ((—oo, r]) is r-relatively compact. 


Then tbere is x € X such that f(x)—inf f and 
argmin f := [x € X|f(x) — inf f] is T-compact. 


In practice, the choice of the topology 7 is ruled 
by the condition (ii) above. For example, if X is a 
reflexive infinite-dimensional Banach space and if f 
Is coercive (1.e., limk =f (x) = +00), we may take 
for r the weak topology (but never the normed 
topology). This restriction implies in practice that 
the first condition in Theorem 4 may fail. In this 
case, it is often useful to substitute f with its lower- 
semicontinuous (l.s.c.) envelope. 


Definition § Given a topology 7, the relaxed function 
f (—f^) is defined as 
f(x) = sup{g(x)|g:X => RU {+00}, 
gis T-l.s.c.,g €f) 


It is easy to check that f is 7-l.s.c. at xo if and only 
if f (xo) = f (xo). Futhermore, 


f(x) = sup inf f, 
UEV, U 


epif = chix xr) (epi f) 


We can now state the relaxed version of Theorem 1.4. 


Theorem 6 (Relaxation). Let f:X— RU {+00}, 
then: inff=inff. Assume further that, for all 
real r, f *((—oo,r]) is T-relatively compact; then f 
attains its minimum and  argminf =argminfN 


(x € X|f(x) =f (x)). 


Moreau-Fenchel Conjugate 


The duality between X and X* will be denoted by the 
symbol (-|-). If X is a Euclidian space, we identify X* 
with X via the scalar product denoted (: | -). 
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Definition 7 Let f:X— RU {+020}. The Moreau- 
Fenchel conjugate f* : X* — R U {+00} of f is defined 
by setting, for every x* € X*: 


f' (x) =sup{ (x|x") - f(x)Ix € Xj 


In a symmetric way, if f* is proper on X*, we define 
the biconjugate f** : X — RU {+00} by setting 


f" (x) =sup{ (x|x") - f(x") |x" EX 


* As a consequence, the so-called Fenchel inequality 


holds: 
(xx) < f(x) + f (x^), 


Notice that f does not need to be convex. However, 
if f is convex, then f* agrees with the Legendre- 
Fenchel transform. 


(x, x*) EX x.X" 


Definition 8 Let f:X—RU{-+oo}. The sub- 
differential of f at x is the possibly void subset of 
Of (x) C X* defined by 


Of (x):— {x" € X^ f(x) + f(x") = (x, x")} 


It is easy to check that Of(x) is convex and weak- 
star closed. Moreover, if f is convex and has a 
differential (or Gateaux derivative) f'(x) at x, then 
Of (x) — (f'(x)). After summarizing some elementary 
properties of the Fenchel transform, we give 
examples in R^ or in infinite-dimensional spaces. 


Lemma 9 


(i) f* is convex, l.s.c. with respect to the weak star 
topology of X*. 
(ii) f'(0)2 —inff and f 2g > f< g. 
(iii) (inf; fi) = sup; f*, for every family {fi}. 
(iv) f**(x)= sup{g(x): g affine continuous on X and 
g € f] (by convention, the supremum is identi- 
cally 一 co if no such g exists). 


Proof (i) This assertion is a direct consequence of the 
fact that f* can be written as the supremum 
of functions gy, where g,.:— (x|-) — f(x). Clearly, 
these functions are affine and weakly star-continuous 
on X*. The assertions (ii), (iii) are trivial. To obtain (iv), 
it is enough to observe that an affine function g of 
the form gí(x)—-(x,x')— $8 satisfies g<f iff 
f(x") € B. [] 


Example 1 Let f: X — R, be defined by 
1 p 
f(x) = hu. 1<p< toc 


then, 


*(x*)=—|lx*|[,., with —+—=1 
Fx) p | py 
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whereas, for p=1, we find /f*-—xp.,- where 


B* = (Ilx* || < 1}. 


Example 2 Let A € Rd be a symmetric positive- 
definite matrix and let f (x) j= (1/2) (Ax | x)(x € R4). 
Then, for all y € Rf, we have f*(y) — (1/2)(A7y | y). 
Notice that if A has a negative eigenvalue, then 
f* = +00. 

Particular examples on R^ are also very popular. 
For instance: 


Minimal surfaces 


f(x) = V1+ |x}? 


f(y) = pn -pP ifpisi 
十 co otherwise 
Entropy 
| i ; 
f= [fees BEER, fo) expb-) 


Example 3 Let C C X be convex, and let f = xc. 
Then, 


f*(x") 


(support function of C) 


—oc(c') = sup(x|x") 
xEC 


Notice that if M is a subspace of X, then 
(xm) 7^ xw:-. We specify now a particular case of 
interest. 

Let 2 be à bounded open subset of R". Take 
X = Co0(N; R7) to be the Banach space of continu- 
ous functions on the compact Q) with values in R^. 
As usual, we identify the dual X* with the space 
M, (9; R^) of Ra-valued Borel measures on 2 with 
finite total variation. Let K be a closed convex of 
Rd such that 0 € K. Then p?(£) :— sup ((£|z): z € K} 
is a non-negative convex l.s. A and positively 
1-homogeneous function on R^ (e.g., px is the 
Euclidean norm if K is the unit ball of RI), Let us 
define C:={py E X: v(x) e K, Vx EQ}. Then, we 
have 


(xc)*(A) = i P(A) 


= [ o&(G) aces) [1 


where 0 is any non-negative Radon measure such 
that \ < @ (the choice of 0 is indifferent). In the case 
where K is the unit ball, we recover the total 
variation of A. 


Example 4 (Integral functionals). 
+00, (0,5, T) 


Given 1 € p «€ 
a measured space and ọ:Q x 


g(x) = —(x — xo, xg/ B) + ao. 


—[0,+00] a 7 & By;-measurable integrand. 
d» the partial conjugate q*'(x,z*):— sup((z |z*) 一 
c(x,z) z € R^) is a convex measurable integrand. 
Let us define 


Ip: « € (Ly) fe (x,u(x))du E RU {+00} 


and assume that I, is proper. Then there holds 
(L;)' =I where 


(iv e QD — Jet Gr voo)dn 


Duality Arguments 
Two Key Results 


The first result related to the biconjugate f** is 
a consequence of the Hahn-Banach theorem. 
Recalling the assertion (v) of Lemma 9, we notice 
that the existence of an affine minorant for f is 
equivalent to the  propernes of f* (ie. 
Jxg € X": f*(x9) < +00). 


Theorem 10 Let f:X— RU (--oo] be convex and 
proper. Then 


(i) f is Ls.c. at xo if and only if f* is proper 
and f'"(xo)—f(xo). In particular, the lower- 
semicontinuity of f on all X is equivalent to the 
identity f = f". E 

(ii) If f* is proper, then f** =f. 

Proof We notice that by Lemma 9, f** € f and f* 

is l.s.c (even for the weak topology). Therefore, 

f" €f and, moreover, f is l.s.c. at xg if f**(xo) > 

f(xo). Conversely, if f is l.s.c. at xo, for every ao < 

f(xo), there exists a neighborhood V of xo such 
that V x(—oo,ag)'epif —0. It follows that 

epif is a proper closed convex subset of X x R 

which does not intersect the compact singleton 

((xo,00)). By applying the Hahn-Banach strict 

separation theorem, there exists (x8, 8o) € X* x R 

such that 


(xo, x9) + aobo < (x, xo) + ago 
for all (x, o) € epi f 


Taking a — oo and x € domf, we find po > 0. In 
fact, Bo > 0 as the strict inequality above would be 
violated for x ^ xo. Eventually, we obtain that f is 
minorized by the affine continuous function 
Thus, we conclude 
that f* is proper and that f** (xo) > ao. 


The assertion (ii) is a direct consequence of the 
equivalence in (i). 口 


Theorem 11 Let X be a normed space and let 
f:X—[0,+00] be a convex and proper function; 
assume that f is continuous at 0, then 


(1) f* achieves its minimum on X* 
(ii) f(0) = f**(0) = —inf f" 


Proof 


(i) Let M be an upper bound of f on the ball (||x|| € 
R}. Then 


f'(x') > sup{(x, x") — f(x): xl € R} 
2 R\|x"|lx. - M 


Hence, for every r, the set {x* € X*: f*(x*) < r) 
is bounded, thus 7-relatively compact, where 7 is 
the weak-star topology on X*. By assertion (i) of 
Lemma 9, f* is 7-l.s.c. and Theorem 4 applies. 
(ii) By Theorem 10, since f is convex proper and 


l.s.c. at xo — 0, we have f(0) 2 f**(0) = —inf f*. 
加 
Some Useful Consequences 
Proposition 12 (Conjugate of a sum). Letf,g:X— 


RU {+00} be convex such that 


dxo € X : f is continuous at x; and g(x0) < 十 co [2] 
Then 


US (fgyGn)- inf 


* 


A6) 63) 


(tbe equality bolds in R). 
(ii) If both sides of the equality in (i) are finite, then 
the infimum in the right-hand side is achieved. 


Proof Without any loss of generality, we may 
assume that x*=0 (we reduce to this case by 
substituting g with g — (-,x*)). We let 


h(p) = inf(f (x + p) + g(x)Ix € X) 


Noticing that (p,x) — f(x + p) + g(x) is convex, we 
infer that h(p) is convex as well. As b is majorized 
by the function p> f(xo + p) + g(xo), which by [2] 
continuous at 0, we deduce from Theorems 1 and 11 
that 5b(0) — 5**(0) and that 5* achieves its infimum. 
Now b(0) = inf(f + g) = —(f + g)' (0) and 


h*(p") = supt (p, p^) — h(p): p € XJ 
= sup{(p,p") — f(x +p) - g(x): xeX;pe X) 
=g"(—p") + f(b") 
The assertions n, (ii) follow since —h**(0)= 
min h* = min {g*(—p*) + f*(p*)}. L] 


Proposition 13 d Let X,Y be two 
Banach spaces and A: X++ Y a linear operator with 
dense domain D(A). Let V:Y—RU(-oo] be a 
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convex l.s.c. function and let F— X be the convex 
functional defined by 


Flai) = [un if u € D(A) 
-Foo otherwise 


Assume that there exists ug € D(A) such that WV is 
continuous at Aug. Then 


(i) The Fenchel conjugate of F is given by 


Vf EX", F(f)-inf(V'(o): o € Y, Ato =f} 


where, if both sides of the equality are finite, the 
infimum on the right-hand side is achieved. 

(ii) If, in addition, Y is reflexive and WV is l.s.c. 
coercive, we have 


F(u) = F“(u) = inf{W(p)|(w,p) € G(A)) (3 
where G(A) denotes the graph of A. 
Proof 


(i) Define H, K:X x Y— RU {+00} by 
H(u, p) is XG(A K(u, p) - V (p) 


Then we have the identity F*(f) =(H + K)'(f,0), 
where the conjugate of H+K is taken with 
respect to the duality (X x Y, X* x Y*). From the 
assumption, K is continuous at (uo, Auo) € 
dom H. By Proposition 12, we obtain 


\(u, p), 


(H + K)'(f, 0) 
= inf. {Kf = H* (g, — 
W- xt (f —8,0)-- H'(g, -e)) 
After a simple computation, it is easy to check 
that 


H'(.-o) = {° 让 和 一/ 

十 oo otherwise 

Kf - o) - 1 wr Bee 
+00 otherwise 


(ii) Let J(u) :—inf(W(p): (u, p) € G(A)}. As observed 
for F* in the proof of (i), we have the identity 
J'(f)-(H--K)'(f,0). Therefore, in view of 
ioris 10, F— F*—J** and it is enough to 
prove that J is convex l.c. proper. Let us 
consider a sequence (un) in X converging to 
some 4 € X. Without any loss of generality, we 
may assume that lim inf /(4,) = lim J(u,) < +00. 
Then there is a sequence (p,,) such that, for every 
n, (Un, Pn) € G(A) and J(u,) > plun) — 1/n. As vv 
is coercive, {pn} is bounded in the reflexive 
space Y and possibly passing to a subsequence, 
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we may assume that p, converges weakly to 
some p. Since G(A) is a (weakly) closed subspace 
of X x Y, we infer that (u,p) as the limit of 
(ün, Pn) still belongs to G(A). Thus, we conclude, 
thanks to the (weak) lower-semicontinuity of V 


lim inf J(u,) = lim Y(p„) > W(p) > J(u) 
n n [] 


An immediate consequence of Propositions 12 and 
13 is the following variant: 


Proposition 14 Under tbe same notation as in 
Proposition 13, let 6: X — RU {+00} be a convex 
function and assume that there exists ug € D(A) 
such that F(ug) < 十 co and Y is continuous at Aug. 
Then we have 


inf (6(w) + V(Au)) = sup {-9"(—A*o) — ¥*(0)} 


ae Y* 


where the supremum on the right-hand side is 
achieved. Furthermore, a pair (u,a) is optimal if 
and only if it satisfies the relations: 6 € OV(Au) and 
—A*c € O¢(i). 


Remark 15 From the assertion (ii) of Proposition 
13, we may conclude that F is l.s.c. whenever the 
operator A is closed. If now A is merely closable 
(with closure denoted by A), we obtain 

Pts) = [e ifu € dom A 


十 co otherwise 


This is the typical situation when F is an integral 
functional defined on smooth functions of the kind 


F(u) = | fe vw dx 


where Q is an bounded open subset of R”,f :0 x 
R”  R is a convex integrand with quadratic growth 
(i.e., cz < f(x,z) € C(1+ lz” for suitables C > 
c » 0). Then X = L7(Q), Y= L^(Q; R”), 


G(v) = [fe v(x)) dx 


and A:u € C'(Q).— Vu € L^(Q; R”). It turns out 
that A is closable and that the domain of A 
characterizes the Sobolev space W'?(Q) on which 
A coincides with the distributional gradient 
operator. 


The situation is more involved if we consider 


F(u) = | fe. Vu) du 


u is a possibly concentrated Radon measure sup- 
ported on Q. In general, the operator A:u € 
C'(Q0) c L2(Q). Vu € L?(Q; R”) is not closable 
and we need to come back to the general formula 
[3]. The general structure of G(A) has been given in 
Bouchitté et al. (1997) and Bouchitté and Fragalà 
(2002, 2003), namely 


(u, £) € G(A) « € W)”, An € LZ (Q; R”): 
£= Vu T n(x) E T(z)" 


where T,(x),V,(x) are suitable notions of tangent 
space and tangential gradient with respect to jj, and 
wi? denotes the domain of the extended tangential 
gradient operator. 


Remark 16 The assertion (ii) of Proposition 13 
is not valid in the  nonreflexive case. In 
particular, for 


F(u) = : f (x, Vu)dx 


where f(x,-) has a linear growth at infinity, 
we need to take Y as the space of R"-values 
vector measures on Q and the relaxed functional 
F* needs to be indentified on the space BV(Q) 
of integrable functions with bounded variations. 
The computation of F** is a delicate problem for 
which we refer to Bouchitté and Dal Maso (1993) 
and Bouchitté and Valadier (1998). 


Remark 17 By duality techniques, it is possible 
also to handle variational integrals of the kind 


F(u) = Í f(x, u(x), Vu(x))dx 


even if the dependence of f(x, u,z) with respect to u 
is nonconvex. The idea consists in embedding the 
space BV(Q) in the larger space BV(Q x R) through 
the map u++1,, where 1, is the characteristic 
function defined on €) x R by setting 


1,(x, £) =d 1 ifw(x)»t 


0 otherwise 


Then it is possible to show, under suitable 
conditions on the integrand f, that there exists 
a convex l.s.c.,  1-homogeneous functional 
G:BV(Q x R) — RU (--oo] such that F(u) = G(1,). 
This functional G is constructed as in the Example 
3 taking C to be a suitable convex subset of 
C?(Q x R). This nice new idea has been the key 
tool of the calibration method developed recently 
(Alberti et al. 2003). 


Convex Variational Problems in Duality 
Finite-Dimensional Case 


We sketch the duality scheme in two cases. 


Linear programming Let c € R",b € R” and A an 
mxn matrix. We denote by A! the transpose 
matrix. We consider the linear program 


(P) inf((c|x): x > 0, Ax € b) 
and its perturbed version (p € R") 
b(p):— inf((c|x): x > 0, Ax +p € b} 
An easy computation gives 


Vy e€ R”, 


sna jtbiy) ifATy--e€0,y20  |4 
b*(y) = l 
十 co otherwise 


Lemma 18 Assume that inf (P) is finite. Then: 


(i) b is convex proper and l.s.c. at 0. 
(ii) (P) has at least one solution. 


Proof We introduce the (n + m) x (m 4- 1) matrix 


B defined by 
EMI 
B=( T p) 


(Im is the m-dimensional identity matrix). Denote 
(Bis bae cuba] CREO the columns of B and K 
the convex cone K:={9}Z1 " Xii 2 0). By 
Farkas lemma, this cone K is closed. 


(i) Let a:= lim inf (b(p): p— 0}. We have to prove 
that a > h(0)=inf P. Let {p-} be a sequence in 
R” such that p. — 0 and b(p.)— o. By the 
definition of h, we may choose x. > 0 such that 
Ax. € b and (c|x.) —^ o. Then we see that the 
column vector X. associated with (x., b — Ax.) € 
R satisfies: Bx. € K and 


“3 Q 

(5) 
CY 

(2) ex 


and there exists x = (x, x') such that x > 0,x' > 0, 
(c|x)=a and Ax 4- x'— b. It follows that x is 
admissible for (P) and then (c| x) — o > h(0). 

(ii) We repeat the proof of (i) choosing p-=0 so 
that a= inf (P). " 


Therefore, 


Thanks to the assertion (i) in Lemma 18, we deduce 
from Theorem 10 that inf(P)=h(0)=h**(0)= 
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sup —h*. Recalling [4], we therefore consider the dual 
problem: 


(P)  sup[-b.y: y>0, AT 62.0) 


Theorem 19 The following assertions are equivalent: 


(i) (P) has a solution. 
(ii) (P*) has a solution. 
(iii) There exists (xo,yo) € R? x RY 
Axo € b, At yotc> 0. 


such that 


In this case, we have min(P)= max(P*) and 
an admissible pair (x,y) is optimal if and 
only if c-x=—b-y or, equivalently, satisfies 
the complementarity relations: (Ax — b): y— 
(Aly+c)-x=0. 


Convex programming Let f,g1,...,g,: X— R be 
convex l.s.c. functions and the optimization problem 


(P) inf(f(x): g;(x 


Here X — R" or any Banach space. As before, we 
introduce the value function 


p € R”, h(p):= inf{f (x) 
g(x) +p; <07€1,2,...,m} 


) <0 jS1,z....,98} 


and compute its Fenchel conjugate: 


AER”, b*(A)= [e {L(x,A)} ifA20 
十 co otherwise 
where 工 (x, 入) : a (x) + MA;gi(x) is the so-called 


Lagrangian. We notice that an is convex and that 
the equality 4(0)=h**(0) is equivalent to the zero- 
duality gap relation 


inf sup L(x, A) = sup inf L(x, A) 
x 入 入 x 


This condition is fulfilled, in particular, if we make 
the following qualification assumption (ensuring 
that h is continuous at 0 and Theorem 11 applies): 


3xo € X: f continuous at xo, g;(xo) < 0, Vj [5] 


Theorem 20 Assume that [5] holds. Then x is 
optimal for (P) if and only if there exist Lagrangian 
multipliers A1, À3, ... Àm in Ry such that 


x € argmin ( + iz isi) Ngi(x) = 0, Vj 
x j 

Notice that the existence of such a solution x 
is ensured if, for example, X — R" and if, for some 
k > 0, the function f +k 5 9; is coercive. 
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Primal-Dual Formulations in Mechanics 


We present here the example of elasticity which 
motivated the pioneering work by J J Moreau on 
convex duality techniques. Further examples can be 
found in Ekeland and Temam (1976). An elastic body is 
placed in a bounded domain 2 C R" whose boundary 
LI consists of two disjoint parts T=ToUT1. The 
unknown u : Q — R” (deformation) satisfies a Dirichlet 
condition u = 0 on To, where the body is clamped. The 
system is subjected to a surface load g € L*(T'1; R”) and 
to a volumic load f € L^(Q; R”). The static equilibrium 
problem has the following variational formulation: 


(P) inf 


u=0 on Af eset dx — {fu 
f enaren) 


where e(u) :— (1/2)(uj,; + 4; ;) denotes the symmetric 
strain tensor and j:(x,z) € Qx RS, R+ is a 
convex integrand representing the local elastic 
behavior of the material. We assume a quadratic 
growth as in Remark 15 (in the case of linear 
elasticity, an isotropic homogeneous material is 
characterized by the quadratic form 


| 入 
j(x,z) = Fr + ule 


A, being the Lamé constants). 
We apply Proposition 14 with X — W^? (Q; R"), 
Y =L? (Q; R”), Au — e(u) and where we set 


sym 
— Jof -udx 

= Jp; 8: u dH"! 
+00 otherwise 


we) = | jæ) dx 


After some computations, we may write the supre- 
mum appearing in Proposition 14 as our dual 
problem 


(u) = if 4 — 0 on To 


(P*) up|- f 0) dx: o € L*(Q; R7), 
Q 
—divo =f on Q,o -n =g on nj 


where j* is the Moreau-Fenchel conjugate with 
respect to the second argument and n(x) denotes 
the exterior unit normal on T. The matrix-valued 
map c is called the stress tensor and /* the stress 
potential. Note that the boundary conditions for oz 
have to be understood in the sense of traces. 


Theorem 21 The problems (P) and (P*) have 
solutions and we have the equality: inf(P) = sup (P*). 


Futhermore, a pair (u,60) is optimal if and only if it 
satisfies tbe following system: 


-div =f on Q (equlibrium) 
a(x) € Oj(x,e(u)) a.e. onm) (constitutive law) 
u-—0 a.e. on To 
n =g on I 


Duality in Mass Transport Problems 
General Cost Functions 


Let X, Y be a compact metric space and c: X x 
Y — [0, 十 co) a continuous cost function. We denote 
by P(X), P(X-x Y) the sets of probability measures 
on X and X x Y, respectively. Given two elements 
u € P(X),v € P(Y), we denote by T(m, v) the subset 
of probability measures in P(X x Y) whose margin- 
als are, respectively, 1 and v. Identified as a subset 
of (C°(X x Y))* (the space of signed Radon mea- 
sures on X x Y), it is convex and weakly-star 
compact. The Monge-Kantorovich formulation of 
the mass transport problem reads as follows: 


T.) int] 人 ,coy) (dndy):y e ri) | [6] 


This formulation, where the infimum is achieved (as 
we minimize an l.s.c. functional on a compact set for 
the weak star topology), is already a relaxation of 
the initial Monge mass transport problem, 


an [ c(x, Tx)u(dx): T* (yu) = j| 


where the infimum is searched among all transports 
maps T:X ^ Y pushing forward p on v (i.e., such 
that u(T* (B) —v(B) for all Borel subset B c Y). 
This is equivalent to restricting the infimum in [6] to 
the subclass {yr} C l'(u, v), where 


vr, (x, )):— | plx, Tx) (dx) 


In order to find a dual problem for [6], we fix 
v € P(Y) and consider the functional F: M,(X)— 
[0, +20) defined by 


_JT(ujv) ifuz0,u(X)-1 
Fu) fie otherwise 


(M(X) denote the Banach space of (bounded) 
signed Radon measures on X). 


Lemma 22 F is convex, weakly-star l.s.c. and 


proper. Its Moreau—Fenchel conjugate is given by 


vpe CX), F'(p) =- / e (y)v(dy) 


where 
p(y) := inf{c(x, y) — p(x): x € X} 


Proof The convexity property is obvious and the 
properness follows from the fact that 


F(u) < 人 c3) n6 (ddy) 


Let pi, be such that jj, — u (weakly star). We may 
assume that lim inf, F(j,) = lim, F(p,) :— o is finite. 
Then jw and the associated optimal ^, are prob- 
ability measures on X and on X x Y, respectively. 
As X and Y are compact, possibly passing to a 
subsequence, we may assume that 7, — *, and 
clearly we have y €T(g,v). Since c(x,y) is Ls.c. 
non-negative, we conclude that 


lim inf F(u) = lim inf c(x, y)y,(dxdy) 
" " XxY 


> 人 c(x, y) (ddy) 
JAXxY 
= F(u) 


Let us compute now F*(y). We have 


-F(g) = inf f — o(x,y)n(dxdy) 


XxY 


-f pde p € P(X), 7 € T'(u, 分 
X 
= oni | [ (ols, 9) = e) (dy) 
y € I(u. yh 
> Å e (y) v(dy) 


To prove that the last inequality is actually an 
equality, we observe that, for every y € Y and € 
C°(X), the minimum of the l.s.c. function c(-,y) — p 
is attained on the compact set X and there exists a 
Borel selection map S(y) such that y°(y) = c(S(y), y) 一 
olS(y) for all y € Y. We obtain the desired equality by 
choosing y defined, for every test «», by 


Vx, y)y(dxdy) := J y(S(y), y)v(dy) 


JXxY 
O 


We observe that, for every € C?(X), the func- 
tion yf introduced in Lemma 22 is continuous (use 
the uniform continuity of c) and therefore the pair 
(y, v) belong to the class 


Fe:={(p, V) € C°(X) x CY(Y): 
p(x) + v(y) € c(x.y)) 
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Let us introduce the dual problem of [6]: 


spd [edu + f dv. (p, WY) € 2 [7] 


We will say that (~,w) € Fe is a pair of c-concave 
conjugate functions if y=" and f= (where 
symmetrically — v*(x):— inf (c(x, y) — v(x): y € Y}). 
Checking the latter condition amounts to verifying 
that p enjoys the so-called c-concavity property 
yp — (in general, we have only p“ > p, whereas 
yp" =~). We refer for instance to Villani (2003) for 
further details about this c-duality. 

Now, by exploiting Theorem 10 and Lemma 22, 
we obtain a very simple proof of Kantorovich 
duality theorem: 


Theorem 23 Tbe following duality formula bolds: 


Tu.) = sup | edu f vav: (v, v) EF 


Moreover, the supremum in the right-hand side 
member is achieved by a pair (p,wv) of conjugate 
c-concave functions such that, for any optimal ^j in 


[6], there holds p(x) + vy) = c(x, y), -a.e. 
Proof By Theorem 10 and Lemma 22, we have 
Te ps V) "4 F"(p) 


=sup{ [ edu + f eds e e o0) 


<sup{ f edu [ var ms 大 | 
< T(m, v) 


where the last inequality follows from the definition 
of Fe. Therefore, inf [6] = sup [7]. Furthermore, on 
the right-hand side of first equality, we increase the 
supremum by substituting y with 4** (recall that 
yp" — q*). Thus, 


sup|7] = sup | wc dp + J gf dv: o € C@(X), 
-. Y 
T c-concave | 


Take a maximizing sequence (Yy, p$) of c-concave 
conjugate functions. It is easy to check that {f,} 
Is equicontinuous on X: this follows from the c-con- 
cavity property and from the uniform continuity of 
c (observe that io,(x1) — v«(x2) = v7 (x1) — ez (x2) < 
supy {c(x1, * ) — c(x2, - )}). Then, by Ascoli’s theorem, 
possibly passing to subsequences, we may assume 
that: o, 一 c, converges uniformly to some continuous 
function % where {c,} is a suitable sequence of 
reals. Then, one checks that $ is still c-concave 
and that (Yn — cn) — ys +c», converges uniformly to 
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of. Thus, 
deduce that 


sup[7] — lim (f Yn dp + | ye dv) 
-lim| [ Ges — en) d+ f (o5 + en) de 
n X Y 


= [ean | ea 
X Y 


The last assertion is a consequence of the extrem- 
ality relation: 


recalling that yj(X)=v(Y)=1, we 


0 = inf[6] — sup[7] 
= | (690 = atx) ~ Vo) (exi) 


Remark 24 


(i) In their discrete version (i.e., jv are atomic 
measures), problems [6] and [7] can be seen as 
particular linear programming problems (see the 
section *Finite-dimensional case"). 

(ii) The case X= Y C R” and c(x, y) =(1/2)|x — y? 
is important. In this case, the notion of c-concavity 
is linked to convexity and the Fenchel transform 
since, for every y € C?(X), one has 


EP s (Lg 
2 2 


Then if (@, @°) is a solution of [7], we find that 


evt): ŽL- gx) 


is convex continuous and that the extremality 
condition: (x) + f(y) — c(x, y) is equivalent to 
Fenchel equality o(x) + p(y) — (x|y). There- 
fore, any optimal ^ is supported in the graph 
of the subdifferential map Oyo. In the case 
where u is absolutely continuous with respect to 
the Lebesgue measure, it is then easy to deduce 
that the optimal ^ is unique and that 7=77,, 
where To=Vwo is the unique gradient (a.e. 
defined) of a convex function such that 
Vp) =v. This is a celebrated result by Y 
Brenier (see, e.g., the monographs by Evans 
(1997) and Villani (2003)). 


The Distance Case 


In the following, we assume that X — Y and that 
c(x,y) is a semidistance. As an immediate 


consequence of the triangular inequality, we have 
the following equivalence: 


p c-concave €» v(x) — ply) € c(x,y), V(x, y) 
e p =p 


Let us denote Lip,(X):— {u € C?(X): u(x) — u(y) < 
c(x, y)). The first assertion of Theorem 23 becomes 
the Kantorovich-Rubintein duality formula: 


T.i v) = max{ [diu —v): ue Lip, (x) [8] 


As it appears, T.(j,v) depends only on the differ- 
ence f = u — v, which belongs to the space Mo(X) of 
signed measure on X with zero average. Defining 
N(f):— T.(f*,f-) provides a seminorm (Kantoro- 
vich norm) on Mo(X) (it turns out that Mo(X) is 
not complete and that in general its completion is a 
strict subspace of the dual of Lip(X)). 

We will now specialize to the case where X is a 
compact manifold equipped with a geodesic dis- 
tance. This will allow us to link the original problem 
to another primal-dual formulation closer to that 
considered in the section “Primal—dual formulation 
in mechanics" and yielding to a connection with 
partial differential equations. As a model example, 
let us assume that K=Q, where Q is a bounded 
connected open subset of R" with a Lipschitz 
boundary. Let X CQ be a compact subset (on 
which the transport will have zero cost) and define 


c(x, y): inf {H'(S\ £): 
S Lipschitz curve joining x toy, SC QO) — [9] 


where H! denotes the one-dimensional Hausdorff 
measure (length). It is easy to check that 


c(x, y) = min(óo(x, y), falx, X) + foly, X)] 


where óo(x, y) is the geodesic distance on Q (induced 
by the Euclidean norm). Furthermore, the following 
characterization holds: 


u € Lip,(X) =u e W'^(Q), 
Vu] € 1 a.e. in Q, 4 — cteonX [10] 
Since f:=j—v is balanced, the value of the 
constant on X in [10] is irrelevant and can be set 


to 0. Thus we may rewrite the right hand side 
member of [8] in a equivalent way as 


max | udf:u e Wl^(Q), 


Q 
IVu| < 1 a.e. on Q, u = 0 on z} [11] 


We will now derive a new dual problem for [11] 
by using Proposition 14. To this aim, we consider 


X= C!(Q) (as a closed subspace of W ^99(Q)), 
Y = C°(Q; R”), Y* =M,(Q;R”) and the operator 
A:u E€ X Vn c Y. 


Theorem 25 Let pve P(Q),f=u—v and c 
defined by [9]. Then, 


Tamu) = ming [ lA: A € M,(Q; R”), 
ĝ 
—divA = f on az [12] 


wbere tbe divergence condition is intended in tbe 


sense that 
| rA-Ve= | pdf 
0 0 


for all e € C* compactly supported in R"\Y. 


Proof (sketch) We apply Proposition 14 with 
o(u)=—Jfaudf if u=0 on X («oo otherwise), 
A = V, and v(v) =0 if |v| € 1 on 2 (+00 otherwise). 
We obtain that the minimum a in [12] is reached 
and that o = 8, where 


-pb := inf | udf:u € C'(Q), 
JQ 
IVu| € 1 on Qu=0 on z} 


To prove that 9 — T.(u, v) —sup (11), we consider a 
maximizer # in [11] and prove that it can be 
approximated uniformly by a sequence {z of 
functions in C'(Q) which satisfy the same con- 
straints. This technical part is done by truncation 
and convolution arguments (we refer to Bouchitté 
et al. (2003) for details). 口 


Remark 26 By localizing the integral identity 
associated with [12], it is possible to deduce 
the optimality conditions which characterize optimal 
pairs (4,À) for [11], [12] (without requiring any 
regularity). This is done by using a weak notion 
of tangential gradient with respect to a measure 
(see Bouchitté et al. (1997) and Bouchitté and 
Fragalà (2002)). If A—6dx where o € L'(Q; R”) 
and if X C ƏN, then we find that f= aV, where the 
pair (2,a) solves the following system: 


—div(aVu) =f on Q 
[Vz| = 1 a.e. on (a > 0} 
4 — 0 a.e. on € 


Ou 
g 0 On 


(diffusion equation) 


(eikonal equation) 
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Remark 27 Given a solution 了 for [6], we can 
construct a solution À for [12] by selecting for every 
(x, y) € spt(^) a geodesic curve $,, joining x and y 
(possibly passing through the free-cost zone X) and 
by setting, for every test ¢: 


e | ( ] os, ort dod 
QxQ V Js, 


where 7s, denote the unit oriented tangent vector 
(See Bouchitté and Buttazzo (2001)). It is also 
possible to show (see Ambrosio (2003)) that any 
solution \ can be represented as before through a 
particular solution 7. As a consequence, the support 
of any solution ^ of [12] is supported in the geodesic 
envelope of the set spt(j:) U spt(v) U X. However, we 
stress the fact that, in general, there is no uniqueness 
at all of the optimal triple (5,24, 和) for [6], [11] 
and [12]. 


Remark 28 An approximation procedure for par- 
ticular solutions of problems [11], [12] can be 
obtained by solving a p-Laplace equation and then 
by sending p to infinity. Precisely, consider the 
solution up € W'^(0) of 


—div(| Vu “Vu) =f on A\D 
u = () on X 


which, for p >n, exists (due to the compact 
embedding W'^(Q)c C°(Q)) and is unique. In 
Bouchitté et al. (2003) it is proved that the sequence 
(45,75), where op = [Vup P ^ Vus, is relatively 
compact in M,(Q;R”) x C°(Q (weakly star with 
respect to the first component) and that every cluster 
point (#, À) solves [11], [12]. It is an open problem 
to know whether or not such a cluster point is 
unique. If the answer is *yes," the process described 
above would select one optimal pair among all 
possible solutions. As far as problem [11] is 
concerned, this problem is connected with the 
theory of viscosity solutions for the infinite Lapla- 
cian (see Evans (1997)) although this theory does 
not provide an answer as it erases the role of the 
source term f. On the other hand, a new entropy 
selection principle should be found for the solutions 
of dual problem [12]. In fact, the following partial 
result holds: let E: Mj(Q; R") — RU (4-oc] be the 
functional defined by 


dA 


EA dx andae 


E(A):= H ixllog(lcl) dx 


--oo otherwise 


Assume that [12] admits at least one solution Ao 
such that E(Ag) < 十 co. Then it can be shown that 
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the sequence {øp} does converge weakly-star to A, 
the unique minimizer of the problem 


inf[ E(A): A solution of [12]} 


The general case, in particular when all optimal 
measures are singular, is open. 


Remark 29  Variational problems [11], [12] have 
important counterparts in the theory of elasticity 
and in optimal design problems (see Bouchitté and 
Buttazo (2001)). They read, respectively, as 


max TE -df: u € 0,4 W™ (Q; R”), 
Vu(x) € K a.e. on Q, u = 0 on z) 
min Fi p (9): A € Mp (Ñ; RT.) 
—divA =f on a) 


where K C E) is a convex compact subset of 
symmetric second-order tensors associated with the 
elastic material, ot(£) 2 sup(£-2z: z € K} is convex 
positively 1-homogeneous and the functional on 
measures fẹ pr(N) is intended in the sense given in 
[1]. A celebrated example is given by Michell's 
problem (Michell 1904) where 5 —2 and K:= {z € 
Ra lp(z)| € 1}, p(z) being the largest singular value 
of z. The potential pz is given by the nondifferenti- 
able convex function px(€) = Tı (€) + 72(£), where the 
7,(€)’s are the singular values of £. 


Unfortunately, it is not known if the vector 
variational problem above can be linked to an 
optimal transportation problem of the type [6], 
even if the analogous of equivalence [10] does exist 
in the Michell’s case, namely (for 2 convex): 


p(e(u) € 1 on 
<=> |(u(x) — w(y)lx — y)| € lx — »l, V(x,y) 
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Introduction 


Mathematical cosmology focuses on the geometrical 
and mathematical aspects of the study of the 
universe as a whole. Because the structure of 
spacetime (with metric tensor g,j(x/)) is governed 
by gravity, with matter and energy causing space- 
time curvature according to the nonlinear gravita- 
tional field equations of the theory of general 
relativity, it has its roots in differential geometry. It 
is to be distinguished from the three other major 
aspects of modern cosmology, namely astrophysical 
cosmology, high-energy physics cosmology, and 
observational cosmology; see Peacock (1999) for 
these aspects. 
The Einstein field equations (EFEs) are 


Ry — 5 Rgap F Agab = KT [1] 


where Rap is the Ricci tensor, R the Ricci scalar, T, 
the matter tensor, A the cosmological constant, and 
& the gravitational constant. Cosmological models 
differ from generic solutions of these equations in 
that they have preferred world lines in spacetime 
associated with the motion of matter and distribu- 
tion of radiation (Ellis 1971). This is a classic case of 
a broken symmetry: the underlying equations [1] are 
locally Lorentz invariant but their solutions are not. 
These preferred world lines, characterized by a unit 
4-velocity vector z^, are associated at late times with 
"fundamental observers," and a key aspect of 
cosmological modeling is determining the observa- 
tional relations such observers would determine 
through astronomical observations. 

The dynamics of cosmological models is deter- 
mined by their matter content. This is usually 
represented in simplified form, often using the 
“perfect-fluid” approximation to represent the effect 
of matter or radiation; that is, 


Tab = (p + p)uauy + Dgab [2] 


where p is the energy density and p the pressure, and 
the matter 4-velocity up is the preferred cosmo- 
logical 4-velocity. This description can include a 
scalar field ó with dynamics governed by the 
Klein-Gordon equation, provided u, is normal to 
spacelike surfaces {ġ = const]. Suitable equations of 
state describe the nature of the matter envisaged 
(e.g., p=0 for baryons, whereas p=p/3 for 


radiation); in the case of a scalar field with potential 
V(ó) and spacelike surfaces {ġ = const.], on choosing 
u^ orthogonal to these surfaces, the stress tensor has 
a perfect-fluid form with p= (1/2)d° + V(Q), 
p=(1/2)¢ — V(d). A cosmological constant A can 
be represented as a perfect fluid with p+p=0, 
A — p. More general matter may involve a momen- 
tum flux density qa and anisotropic pressures Tap 
(Ehlers 1961). Whatever the nature of the matter, it 
will usually be required to satisfy energy conditions 
(Hawking and Ellis 1973). All realistic matter has a 
positive inertial mass density: 


p+p>0 [3] 


(note that realistic cosmological models are non- 
empty), whereas all ordinary matter has a positive 
gravitational mass density: 


p+3p>0 [4] 


but this is not necessarily true for a scalar field or 
effective cosmological constant. 

Mathematical cosmology (Ellis and van Elst 1999) 
studies (1) generic properties of solutions with a 
preferred 4-velocity field and matter content as 
indicated above, (2) the standard FLRW models, 
(3) approximate FLRW solutions, and (4) other 
exact and approximate cosmological solutions. The 
ultimate underlying issue is (5) the origin of the 
universe. We look at these in turn. We aim to use 
covariant methods as far as possible, to avoid being 
misled by coordinate effects, and to obtain exact 
solutions and exact results as far as possible, because 
approximate methods can be misleading in the case 
of these nonlinear field equations. 


Exact Properties 


We can split the equations into spacelike and 
timelike parts relative to the 4-velocity z^, obtain- 
ing the (1 + 3) covariant dynamical equations and 
identities in terms of the fluid shear o,,, vorticity 
Wabs expansion O—2^,, and acceleration a^ = 
1. uh (Ehlers 1961, Ellis 1971, Ellis and van Elst 
1999). The energy density of a perfect fluid obeys 
the conservation equation 


“I MN: 


p= (p + p) [5] 
with extra terms occurring in the case of more 
complex matter. From the momentum equations, 
pressure-free solutions are geodesic (a^ —0). The 
crucial Raychaudhuri-Ehlers equation for the 
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time derivative of the expansion (Ehlers 1961) 
can be written as 


35-267 - 9) +h, - 5 (pp) A [6 


where the representative length scale S is defined by 
O-—3$/S. This is the basis of the “fundamental 
singularity theorem": if in an expanding universe 
w — 0 — a^ and the combined matter present satisfies 
[4], with A < 0, then there was a singularity where 
S — 0 a finite time to < 1/Ho ago, Ho = ($/S)o being 
the present value of the Hubble constant. The energy 
density will diverge there, so this is a spacetime 
singularity: an origin of physics, matter, and space- 
time itself. However, the deduction does not follow if 
there is rotation or acceleration, which could 
conceivably avoid the singularity, so this result is by 
itself inconclusive for realistic cosmologies. 

The vorticity obeys conservation laws analogous 
to those in Newtonian theory (Ehlers 1961). 
Vorticity-free solutions (w— 0) occur whenever the 
fluid flow lines are hypersurface-orthogonal in 
spacetime, that is, there exists a cosmic time 
function for the comoving observers, which will 
measure proper time along the flow lines if 
additionally the fluid flow is geodesic. The rate of 
change of shear is related to the conformal curvature 
(Weyl) tensor, which represents the free gravita- 
tional field, and which splits into an electric part E, 
and a magnetic part H,, in close analogy with 
electromagnetic theory. Shear-free solutions (o — 0) 
are very special because they strongly constrain the 
Weyl tensor; indeed if the flow is shear free and 
geodesic, then it either does not expand (© — 0), or 
does not rotate (w=0) (Ellis 1967). The set of 
cosmological observations associated with generic 
cosmological models has been characterized in 
power series form by Kristian and Sachs (1966), 
and that result has been extended to general models 
by Ellis et al. (1985). 

The local regularity of the theory is expressed in 
existence and uniqueness theorems for the EFEs, 
provided the matter behavior is well defined through 
prescription of suitable equations of state (Hawking 
and Ellis 1973). However, in general the theory 
breaks down in the large, and this feature is 
specified by the Hawking-Penrose singularity theo- 
rems, predicting the existence of a geodesic incom- 
pleteness of spacetime under conditions applicable 
to realistic cosmological models satisfying the energy 
conditions given by eqns [3] and [4] (Hawking and 
Elis 1973, Tipler et al. 1980). However, the 
conclusion does not follow if the energy conditions 
are not satisfied. Furthermore, the deduction follows 


only if the gravitational field equations remain valid 
to arbitrarily early times; but we would in fact 
expect that, at high enough energy densities, 
quantum gravity would take over from classical 
gravity, so whether or not there was indeed a 
singularity would depend on the nature of the as 
yet unknown theory of quantum gravity. The cash 
value of the singularity theorems then is the 
implication that, when the energy conditions are 
satisfied, one would indeed be involved in such a 
quantum gravity realm in the very early universe. 


The Standard Friedmann-Lemaitre 
Models 


The standard models of cosmology are the Fried- 
mann-Lemaitre (FL) models with Robertson-Walker 
(RW) geometry: that is, they are exactly spatially 
homogeneous and locally isotropic, invariant under a 
Gg of isometries (Robertson 1933, Ehlers 1961). 
They have a unique cosmic time function t, with 
space sections [t — const.] of constant spatial curva- 
ture orthogonal to the uniquely preferred 4-velocity 
4^. The fluid acceleration, vorticity, and shear all 
vanish, and all physical quantities depend only on the 
time coordinate t. They can be represented by a 
metric with scale factor S(t): 


ds* = g,, dx^dx^ 
-dr + S'(r) (d? + f?(r)(d& + sin? 8 do?)} 
[7] 


in comoving coordinates (x^) = (t, r, 0, o), where f(r) = 
{sinr,r, sinh r} if {k= +1,0, —1}, and the matter is a 
perfect fluid with 4-velocity vector u* = dx^/ds = 69. 
The curvature of the space sections [t— const.] is 
K — k/S?; these 3-spaces are necessarily closed (com- 
pact) if they are positively curved (k = +1), but may be 
open or closed in the flat (& — 0) and negatively curved 
(k— —1) cases, depending on their topology 
(Lachieze-Rey and Luminet 1995). 

Matter obeys the conservation equation [5], whose 
outcome depends on the equation of state; for 
baryons p=M/S*, whereas for radiation p=M/S*, 
where M is a constant. The dynamics of the models is 
governed by the Raychaudhuri equation 


Sk 
which has the Friedmann equation 
38? 3k 


as a first integral whenever $ 4 0. Depending on the 
matter components present, one can qualitatively 
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characterize the dynamical behavior of these models 
(Robertson 1933) and find exact and approximate 
solutions to these equations as well as phase planes 
representing the relation of the different models to 
each other; for example, Ehlers and Rindler (1989) 
give the phase planes for models with noninteracting 
matter and radiation and an arbitrary cosmological 
constant. Universes with maxima or minima in S(t) 
can only occur if k — -- 1; when A — 0, the universe 
recollapses in the future iff k& — -- 1. Static solutions 
are possible only if &— --1 and (assuming [4]) 
A20. The simplest expanding solutions are the 
Einstein-de Sitter universes with k =0= A. 

Equation [8] is a special case of [6], with 
corresponding implications: if the combined matter 
present satisfies [4], with A < 0, then there must have 
been an initial singularity, or at least the universe 
must have emerged from a quantum gravity domain. 
The temperature would have been arbitrarily high in 
the past, so there was a hot big bang era in the early 
universe where matter and radiation were in equili- 
brium with each other at very high temperatures that 
rapidly fell as the universe expanded. Many physical 
processes took place then, in particular nucleosynth- 
esis of light elements took place at ~10’ K. Decou- 
pling of matter and radiation took place at a 
temperature of ~4000K, followed by formation of 
stars and galaxies (see Peacock (1999) for a discus- 
sion of these physical processes). The black-body 
radiation emitted by the surface of last scattering at 
4000 K is observed by us today as cosmic black-body 
radiation (CBR) at a temperature of 2.75 K. 

One can determine observational relations for 
these models such as the magnitude-redshift relation 
for *standard candles" at recent times from the EFEs 
(Sandage 1961). The aim of observations is to 
determine the Hubble constant Ho, dimensionless 
deceleration parameter qo— —(3/H2)(S$/S)y, and 
normalized density parameters Oo; = &poi/ 3H for 
each component of matter present. The spatial 
curvature and the cosmological constant then follow 
from [6] and [9]; also the present scale factor So is 
determined if k #0. The universe is of positive 
spatial curvature (k=+1) iff Qo 2 O4 + Q4 > 1, 
where Qn = 35; Qoi, Q4 = A/3Hs. Current observa- 
tions indicate 0, 7 0.3, O4 œ 0.7, Q9 œ 1.02 € 
0.02. Because the nucleosynthesis results limit the 
baryon density to a very low value (Qop ~ 0.02), 
which is about the same as the density of luminous 
matter, this indicates the dominant presence of both 
nonbaryonic dark matter and a repulsive force 
corresponding to either a cosmological constant or 
varying scalar field (dark energy). 

Crucial causal limitations occur because of the 
existence of particle horizons (Rindler 1956), the 


nature of which is most clear when represented in 
conformal diagrams (Hawking and Ellis 1973, Tipler 
et al. 1980). These result from the fact that light 
can only proceed a finite distance in the finite time 
since the origin of the universe, and imply that for 
a standard radiation-dominated hot-big-bang early 
universe, regions of larger than ~1° angular size on 
the surface of last scattering, which emits the CBR, 
are causally disconnected: hence, no causal process 
since the start of the universe can account for the 
éxtreme isotropy of the CBR (AT/T ~ 10^? over 
the whole sky, once a dipole anisotropy AT/T ~ 
107? due to our local velocity relative to the 
cosmological rest frame is allowed for). This is the 
"horizon problem," one of the driving forces 
behind the theory of "inflation" (Guth 1981): the 
idea that, in the very early universe, a slow-rolling 
scalar field led to a brief exponential expansion 
through at least 50 e-folds (during which time the 
spacetime was approximately de Sitter), thus 
smoothing the universe and solving the horizon 
problem (Guth 1981, Peacock 1999). This is 
possible because a scalar field can violate the energy 
condition [3] and so allows acceleration: S > 0. 
Consequently, there are now many studies of the 
dynamics of FLRW solutions driven by scalar fields 
and the subsequent decay of these scalar fields into 
radiation. One interesting point is that one can 
obtain exact solutions of this kind for arbitrarily 
chosen evolutions S(f), provided they satisfy a 
restriction on the magnitude of $^, by running the 
field equations backwards to determine the needed 
potential V(ó) (Ellis and Madsen 1991). The 
inflationary paradigm is dominant in present-day 
theoretical cosmology, but suffers from the problem 
that it is not in fact a well-defined theory, for there 
is no single accepted proposal for the physical 
nature of the effective scalar field underlying the 
supposed exponential expansion; rather there are 
numerous competing proposals. As the inflaton has 
not yet been identified, this theory is not yet 
soundly linked to well-established physics. 


Approximate FL Solutions 


The real universe is, of course, not exactly FL, and 
studies of structure formation depend on studies of 
solutions that are approximately FL models — they 
are realistic (“lumpy”) universe models. These 
enable detailed studies of observable properties 
such as CBR anisotropies and gravitational lensing 
induced by matter inhomogeneities, and of the 
development of those inhomogeneities from quan- 
tum fluctuations in the very early universe that then 
get expanded to very large scales by inflation. 
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The key problem here is that apart from the standard 
coordinate freedom allowed in general relativity, there 
is a serious gauge issue: the background FL model is not 
uniquely determined by the realistic universe model; 
however, the magnitudes of many perturbed quantities 
depend on how it is fitted into the lumpy model. For 
example, the density perturbation óp is determined 
pointwise by the equation 

óp(x') = p(x’) — p(x’) 
where p(x' is the background density. But by 
altering the correspondence between the background 
and realistic models (specifically, by the choice of 
surfaces p(x‘) — const. in the realistic model) one can 
assign that quantity any value, including zero (if one 
chooses p(x’) = p(x')). This is the “gauge problem.” 

One can handle it by using standard variables and 
keeping close track of the gauge freedom at all 
times. However, one then ends up with higher-order 
equations than necessary because some of the 
perturbation modes present are pure gauge modes 
with no physical significance. Alternatively, one can 
fix the gauge by some unique specification of how 
the background model is fitted into the realistic 
model, but there is no agreement on a unique way to 
do this, and different choices give different answers. 
The preferable resolution is to use gauge-invariant 
variables, either coordinate based (Bardeen 1980) or 
covariant, based on the (1+3) covariant decomposi- 
tion of spacetime quantities mentioned above (Ellis 
and Bruni 1989), in either case resulting in pertur- 
bation equations without gauge freedom and of 
order corresponding to the physical degrees of 
freedom. The key point in the latter approach is to 
choose covariant variables that vanish in the back- 
ground spacetime; they are then automatically gauge 
invariant. Realistic structure formation studies carry 
out this process for a mixture of matter components 
with different average velocities, and extend to a 
kinetic theory description of the background radia- 
tion (see Ellis and van Elst (1999) and references 
therein). The outcome is a prediction of the CBR 
anisotropy power spectrum, determined by the 
inhomogeneities in the gravitational field and the 
motions of the matter components at decoupling 
(Sachs and Wolfe 1967). This spectrum can then be 
compared with observations and used in determin- 
ing the values of the cosmological parameters 
mentioned above (see Peacock 1999). 

One crucial issue is why it is reasonable to use a 
perturbed FL model for the observable region of the 
universe. The key argument is that this is plausible 
because of the high isotropy of all observations 
around us when averaged on a sufficiently large 
spatial scale, and particularly the very low anisotropy 


of the CBR. The Ehlers-Geren-Sachs (EGS) theorem 
(Ehlers et al. 1968) provides a sound basis for this 
argument: it shows that if freely propagating CBR 
(obeying the Liouville equation) is exactly isotropic in 
an expanding universe domain U,then the universe is 
exactly FL in that domain (i.e., it has exactly the RW 
spatially homogenous and isotropic geometry there), 
the point being that any inhomogeneities in the 
matter distribution between us and the surface of last 
scattering will produce anisotropies in the CBR 
temperature we measure. But that result does not 
apply to the real universe, because the CBR is not 
exactly isotropic. The “almost EGS" theorem 
(Stoeger et al. 1995) shows that this result is stable: 
almost isotropic CBR in the domain 4 implies that 
the universe is almost-FL in that domain. The 
application to the real universe comes by making a 
weak Copernican assumption: “we assume we are 
not special, so all observers in U/ (taken to be the 
visible part of the universe) will also see almost 
isotropic CBR, just as we do." The result then 
follows. A further argument for homogeneity of the 
universe comes from postulating “uniform thermal 
histories” (Bonnor and Ellis 1986), but that argument 
is yet to be completed and applied in a practical way. 


Anisotropic and Inhomogeneous Models 


The FL universes are geometrically extremely special. 
We wish further to understand the full range of 
possible universe models, their dynamical behaviors, 
and which of them might, at some epoch, realistically 
represent the real universe. This enables us to see how 
the approximate FL models fit into this wider set of 
possibilities, and under what circumstances they are 
attractors in this set of cosmologies. 

Exact solutions are characterized by their space- 
time symmetries. Symmetries are characterized by 
the dimension s of the surfaces of homogeneity and 
the dimension g of the isotropy group at a general 
point, together giving the dimension r — s + t of the 
group of isometries G, (at special points, such as a 
center of symmetry, s can decrease and q increase 
but always so that r stays unchanged). In the case of 
a cosmological model, because the 4-velocity s^ is 
invariant under isotropies, the only possible dimen- 
sions for the isotropy group are q = 3,1,0; whereas 
the dimension t of the surfaces of homogeneity can 
take any value from 4 to 0. This gives the basis for a 
classification of cosmological spacetimes (Ellis 1967, 
Ellis and van Elst 1999). 

When g=3, we have isotropic solutions — there 
are no preferred spatial directions — and it is then 
a theorem that they must be spatially homoge- 
neous FL universes (Ehlers 1961). When g=1, we 


have locally rotationally symmetric (LRS) solu- 
tions, with precisely one preferred spacelike direc- 
tion at a generic point (Ellis 1967). When q=0, the 
solutions are anisotropic in that there can be no 
continuous group of rotations leaving the solution 
invariant; however, there can be discrete isotropies 
in some special cases. 

When t= 4,we have spacetime homogeneous solu- 
tions, with all physical quantities constant; they cannot 
expand (by [5] and [3]). Nevertheless, two cases are of 
interest. For q = 1 (r= 5) we find the Gödel universe, 
rotating everywhere with constant vorticity, which 
illustrates important causal anomalies (Gódel 1949, 
Hawking and Ellis 1973). For q=3 (r— 6), we find 
the Einstein "static universe" (Einstein 1917), the 
unique nonexpanding FL model with & — 1 and A > 0. 
It is of interest because it could possibly represent the 
asymptotic initial state of nonsingular inflationary 
universe models (Ellis et al. 2003). The higher- 
symmetry models (de Sitter and anti-de Sitter 
universes with higher-dimensional isotropy groups) 
are not included here because they do not obey the 
energy condition [3] — they are empty universes, 
which can be interesting asymptotic states but are 
not by themselves good cosmological models. 

When t=3, we have spatially homogeneous 
evolving universe models. For 4—0 (r—3), there 
are a large family of Bianchi universes, spatially 
homogeneous but anisotropic, characterized into 
nine types according to the structure constants of 
the Lie algebra of the three-dimensional symmetry 
group G3. These can be “orthogonal”: the fluid flow 
is orthogonal to the surfaces of homogeneity, or 
“tilted”; the latter case can have fluid rotation or 
acceleration, but the former cannot. They exhibit a 
large variety of behaviors, including power-law, 
oscillatory, and nonscalar singularities (Tipler et al. 
1980). A vexed question is whether truly chaotic 
behavior occurs in Bianchi IX models. The behavior 
of large families of these models has been character- 
ized in dynamical systems terms (Wainwright and 
Ellis 1996), showing the intriguing way that higher- 
symmetry solutions provide a “skeleton” that guides 
the behavior of lower-symmetry solutions in the 
space of spacetimes. Many Bianchi models can be 
shown to isotropize at late times, particularly if 
viscosity is present; thus, they are asymptotic to the 
FL universes in the far future. In some cases, Bianchi 
models exhibit intermediate isotropization: they are 
much like FL models for a large part of their life, but 
are very different from it both at very early and very 
late stages of their evolution. These could be good 
models of the real universe. An important theorem 
by Wald (1983) shows that a cosmological constant 
will tend to isotropize Bianchi solutions at late 
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times. This is an indication that inflation can 
succeed in making anisotropic early states resemble 
FL models at later times. Observational properties 
like element abundances and CBR anisotropy 
patterns can be worked out in these models (some 
of them develop a characteristic isolated “hot spot” 
in the CBR sky). For g=1 (r=4), we have spatially 
homogeneous LRS models, either Kantowski Sachs 
or Bianchi universes, and again observations can be 
worked out in detail and phase planes developed 
showing their dynamical behavior, often isotropiz- 
ing at late times. There are orthogonal and tilted 
cases, the latter possibly involving nonscalar singu- 
larities. For g=3 (r=6), we have the isotropic FL 
models, discussed above. Both the LRS and isotropic 
cases could be good models of the real universe. 

When t=2, we have inhomogeneous evolving 
models. This is a very large family, but the LRS 
(q — 1,r— 2) cases have been examined in detail; in 
the case of pressure-free matter, these are the 
Tolman-Bondi inhomogeneous models (Bondi 
1947) that can be integrated exactly, and have 
been used for many interesting astrophysical and 
cosmological studies. Krasinski (1997) gives a very 
complete catalog of these and lower-symmetry 
inhomogeneous models and their uses in cosmology. 
A considerable challenge is the dynamical systems 
analysis for generic inhomogeneous models, needed 
to properly understand the early evolution of generic 
universe models (Uggla et al. 2003), and hence to 
determine what is generic behavior. 


The Origin of the Universe 


The issue underlying all this is what led to the initial 
conditions for the universe, for example, providing 
the starting conditions for inflation. There are many 
approaches to studying the quantum gravity phase 
of cosmology, including the Wheeler-de Witt equa- 
tion, the path-integral approach, string cosmology, 
pre-big bang theory, brane cosmology, the ekpyrotic 
universe, the cyclic universe, and loop quantum 
gravity approaches. These lie beyond the purview of 
the present article, except to say that they are all 
based on unproven extrapolations of known physics. 
The physically possible paths will become clearer as 
the nature of quantum gravity is elucidated. 

It is pertinent to note that there exist nonsingular 
realistic cosmological solutions, possible in the light 
of the violations of the energy condition enabled by 
the supposed scalar fields that underlie inflationary 
universe theory. These nonsingular solutions can even 
avoid the quantum gravity era (Ellis et al. 2003). 
However, they have very fine-tuned initial conditions, 
which is nowadays considered as a disadvantage; but 
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there is no proof that whatever processes led to the 
existence of the universe preferred generic rather than 
fine-tuned conditions; this is a philosophical rather 
than physical assumption. It may well be that, as 
regards the start of the universe, the options are that 
either an initial singularity occurred, or the initial 
conditions were very finely tuned and allowed an 
infinitely existing universe. Investigation of whether 
this conjecture is in fact valid, and if so which is the 
best option, are intriguing open topics. 


See also: Einstein Equations: Exact Solutions; 
Einstein-Cartan Theory; General Relativity: Experimental 
Tests; General Relativity: Overview; Gravitational 
Lensing; Lie Groups: General Theory; Newtonian Limit of 
General Relativity; Quantum Cosmology; Shock Wave 
Refinement of the Friedman—Robertson—Walker Metric; 
Spacetime Topology, Causal Structure and Singularities; 
String Theory: Phenomenology. 
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Introduction 


The general symplectic reduction theory (see 
Symmetry and Symplectic Reduction) becomes 
much -richer and has many applications if the 
symplectic manifold is the cotangent bundle 
(T*Q, Qo = —dO9) of a manifold Q. The canonical 
1-form 99 on TO is given by 9o0(a@q)(Vo,) = 
Qa(Ta ro(Va)), for any q€Q, o4 € TO, and 
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tangent vector Va, € T4, (T* O), where 79: T'O— O 
is the cotangent bundle projection and T,,70: 
Ta, (T*Q) — T40 is its tangent map (or derivative) 
at q. In natural cotangent bundle coordinates (q', p;), 
we have Oo = p;dq' and Ng — dq' ^ dp. 

Let 6: G x Q — OQ bea left smooth action of the Lie 
group G on the manifold and QO. Denote. by 
g:q-— $(g,q) the action of g € G on the point qE Q 
and by $,: O — O the diffeomorphism of Q induced 
by g. The lifted left action G x T*O — T* Q, given by 
g-a =T qg rla) for geG and o;cT;O, 
preserves Oo, and admits the equivariant momentum 
map J:T*OQ — g' whose expression is (J(o4), £) — 
ag ((£o(q)), where £ € g, the Lie algebra of G, (,) : g* x 
g — R is the duality pairing between the dual g* and g, 
and £o(q) =d®( exp té,q)/dt|,_, is the value of the 


infinitesimal generator vector field £5 of the G-action 
at q€O (see Hamiltonian Group Actions and 
Symmetries and Conservation Laws). Throughout 
this article, it is assumed that the G-action on OQ, 
and hence on T*Q, is free and proper. Recall also 
that ((T*Q),,,(Qo),,) denotes the reduced manifold 
at weg" (see aped and Symplectic Reduction), 
where (T*OQ), :— J (u)/ G, is the orbit space of the 
G,-action on the momentum level manifold J(u) 
and G,:={geG|Adju=p} is the isotropy sub- 
group of the coadjoint representation of G on g'. 

The left-coadjoint representation of gE G on Eg’ 
is denoted by Ad; 

Cotangent odes reduction at zero is already quite 
interesting and has many applications. Let p: O —^ O/G 
be the G-principal bundle projection defined by the 
proper free action of G on Q, usually referred to as the 
shape space bundle. Zero is a regular value of J and the 
map o:((T*Q)o, (Qo)o ) =(T*(Q/G), Qa/e) given 
by go(IaogD(T5p(v4)) :— aq (v4), where ag €f (0), 
[o4] € (T*Q)o, and v, € T,Q, is a well-defined sym- 
plectic diffeomorphism. 

This theorem generalizes in two nontrivial ways 
when one reduces at a nonzero value of J: an 
embedding and a fibration theorem. 


Embedding Version of Cotangent 
Bundle Reduction 


Let u€g', O,:— O/G,, p,: OQ — O, the projection 
onto the G,-orbit space, g,:— {E€ g|ad;u — 0] the 
Lie algebra of the coadjoint isotropy subgroup G,, 
where ade7: -[& i] for any & 1 €g,ad;:g' — g* the 
dual map, j/:— pl ,Eg the restriction of H to g, 
and ((T* Q),. (00),) the reduced space at p. The 
induced G,-action on T*O admits the equivariant 
momentum map J": T'Q— g, given by J"(ag)= 
J(aq)|,,- Assume there is a G,,-invariant 1-form a, 
on O with values in (J“)~!(’). Then there is a unique 
closed 2-form 5, on Q, such that p78, = dao. Define 
the magnetic term B,:=7% Pp, where TQ, : 
T'Q,—OQ, is the votangept. "bundle projection, 
which is a closed 2-form on T*Q,. Then the map 
Vu * (POs (Qo) a)? (T* Ox; No, = B,) ie by 
Pu ([ag]( Tp, (v4)): 一 (az 一 au (q)) (va), for Ge] ! (u), 
[o4] €(T* Q),, and v; € TO, is a symplectic embed- 
ding onto a submanifold of T* O,, covering the base 
O,. The embedding y, is a diffeomorphism onto 
T*O, if and only if g—g,. If the 1-form a, takes 
values in the smaller set J y j) then the image of p, is 
the the vector sub-bundle [Tp,,(VQ)|° of T* O,, where 
VO C TO is the vertical vector sub-bundle consisting 
of vectors tangent to the G-orbits, that is, its fiber at 
q€Q equals V,O—(£o(q)|£€gl, and ° denotes the 
annihilator relative to the natural duality pairing 
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between TQ, and T*Q,,. Note that if g is abelian or 
/一 0, the embedding y, is always onto and thus the 
reduced space is again, topologically, a cotangent 
bundle. 

It should be noted that there is a choice in this 
theorem, namely the 1-form a,. Whereas the 
reduced symplectic space ((T*O),, (Q9),) is intrin- 
sic, the symplectic structure on the space T"O, 
depends on o,. The theorem above states that no 
matter how a, is chosen, there is a symplectic 
diffeomorphism, which also depends on a, of the 
reduced space onto a submanifold of T*O,.. 


Connections 


The 1-form a, is usually obtained from a left 
connection on the principal bundle p, : O —^ O/G, or 
p: Q— Q/G. A left connection 1-form A EQ! (Q; g) 
on the left principal G-bundle p: O — QO/G is a Lie 
algebra-valued 1-form A: TO — g, where g denotes 
the Lie algebra of G, satisfying the conditions A(£9) = € 
for all £ € g and A(T,®,(v)) =Ad,(A(v)) for all g € G 
and v € T,Q, where Ad, denotes the adjoint action of 
G on g. The horizontal vector sub-bundle HQ of the 
connection À is defined as the kernel of A, that is, its 
fiber at q € O is the subspace H, := ker A(q). The map 
vg — vera (v4) :— [A(q)(v3)]o(q) is called the vertical 
projection, while the map vg! horg(vq):= v, 一 
verg(vą) is called the horizontal projection. Since for 
any vector v; € T4,O we have vg = verg(vq) + horg(vq), 
it follows that TO— HO $ VO and the maps 
hor,: TQ — H,Q and verg: T;O — V,Q are projec- 
tions onto the horizontal and vertical subspaces at every 
qeQ. 

Connections can be equivalently defined by the 
choice of a sub-bundle HO C TO complementary to 
the vertical sub-bundle VO satisfying the following 
G-invariance property: H,,Q=T,®,(H,Q) for 
every g € G and g € Q. The sub-bundle HO is called, 
as before, the horizontal sub-bundle and a connection 
1-form A is defined by setting A(q)(£o(q) + u4) — €, 
for any £e g and u4 € H,O. 

The curvature of the connection A is the Lie 
algebra-valued 2-form on O defined by B(ug,vg) = 
dA(hor,(u,), hor;(v4)). When one replaces vectors in 
the exterior derivative with their horizontal projec- 
tions, then the result is called the exterior covariant 
derivative and the preceding formula for B is often 
written as B — DA. Curvature measures the lack of 
integrability of the horizontal distribution, namely 
B(u,v) = —A([hor(u),hor(v)]) for any two vector 
fields u and v on O. The Cartan structure equations 
state that B(u,v)=dA(u,v) — [.A(u),.A(v)], where 
the bracket on the right hand side is the Lie 
bracket in g. 
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Since the connection A is a Lie algebra-valued 
l-form, for each seg’ the formula o,(q):— 
A(q) (u), where A(q)' :g' — T;O is the dual of the 
linear map A(q): TQ — g, defines a usual 1-form on 
OQ. This 1-form «, takes values in J (1) and is 
equivariant in the following sense: 47a; = Ady for 
any g € G. | 


Magnetic Terms and Curvature 


There are two methods to construct the 1-form a,, 
from a connection. The first is to start with a 
connection 1-form A" € QHO; g,) on the principal 
G,-bundle p,: O —^ O/G,. Then the 1-form a, := 
(u|, , A") € Q! (D) is G,-invariant and has values in 
(J^ ) (ul, ). The magnetic term B, is the pullback to 
IU! G;,) of the Jl, -component da, of the 
curvature of A“ thought of as a 2-form on the 
base O/G,,. 

The second method is to start with a connection 
A €Q!(Q,g) on the principal bundle p: O — O/G, 
to define o, := (u, A) € Q! (Q), and to observe that 
this 1-form is G,-invariant and has values in J ' (1i). 
The magnetic term B, is in this case the pullback to 
T*(O/G,) of the jj-component da, of the curvature 
of A thought of as a 2-form on the base O/G,.. 


The Mechanical Connection 


If (Q, ((,))) is a Riemannian manifold and G acts by 
isometries, there is a natural connection on the 
bundle p: O — O/G, namely, define the horizontal 
space at a point to be the metric orthogonal to the 
vertical space. This connection is called the mechan- 
ical connection and its horizontal bundle consists of 
all vectors v, € TQ such that J(((v;, -))) — 0. 

To determine the Lie algebra-valued 1-form A of 
this connection, the notion of locked inertia tensor 
needs to be introduced. This is the linear map 
l(q): g— g* depending smoothly on q € O defined by 
the identity (I(g)é,) = ((£o(d),mo())) for any 
£,]€g. Since the G-action is free, each l(q) is 
invertible. The connection 1-form whose horizontal 
space was defined above is given by A(q)(vg)= 
I(q) (Jea ))). 

Denote by K: T'O—R the kinetic energy of the 
metric ((,)) on the cotangent bundle, that is, 
K(((vs, -))) :- (1/2)|v||. The 1-form a, — A(- )*u is 
characterized for the mechanical connection A by the 
condition K(o,(4)) = inf (K(&;) | By € J(u) ^ T7]. 


The Amended Potential 


A simple mechanical system is a Hamiltonian system 
on a cotangent bundle T*O whose Hamiltonian 
function is the the sum of the kinetic energy of a 
Riemannian metric on O and a potential function 


V:O-— R. If there is a Lie group G acting on O by 
isometries and leaving the potential invariant, then 
we have a simple mechanical system with symmetry. 
The amended or effective potential V,: O—R at 
weg is defined by V,:=Hoa,, where o, is the 
1-form associated to the mechanical connection. Its 
expression in terms of the locked moment of inertia 
tensor is given by V,(q):— V(q) 十 (1/2)(u, I(q) !u). 
The amended potential naturally induces a smooth 
function V, € C*(O/G,). 

The fundamental result about simple mechanical 
systems with symmetry is the following. The push- 
forward by the embedding y, :((T*Q),,,(QQ),,) ^ 
(T*O„, 9o, —B,) of the reduced Hamiltonian 
H, € C*((T'Q),) of a simple mechanical system 
H = K+V o mọ € C*(T*O) is the restriction to the 
vector sub-bundle o,((T*O),) C T'(O/G,), which 
is also a symplectic submanifold of (T*(O/G,, 
Oo;c, — Ba), of the simple mechanical system on 
T'(Q/G,) whose kinetic energy is given by the 
quotient Riemannian metric on Q/G, and whose 
potential is V,. However, Hamilton’s equations on 
T'(O/G,) for this simple mechanical system are 
computed relative to the magnetic symplectic form 
Noe, —Bp- 

There is a wealth of applications starting from 
this classical theorem to mechanical systems, span- 
ning such diverse areas as topological characteriza- 
tion of the level sets of the energy-momentum map 
to methods of proving nonlinear stability of relative 
equilibria (block-diagonalization of the stability 
form in the application of the energy-momentum 
method). 


Fibration Version of Cotangent Bundle Reduction 


There is a second theorem that realizes the reduced 
space of a cotangent bundle as a locally trivial 
bundle over shape space O/G. This version is 
particularly well suited in the study of quantization 
problems and in control theory. The result is the 
following. Assume that G acts freely and properly 
on QO. Then the reduced symplectic manifold (T* O),, 
is a fiber bundle over T'(O/G) with fiber the 
coadjoint orbit O,. How this is related to the 
Poisson structure of the quotient (T*O)/G will be 
discussed later. 


The Kaluza-Klein Construction 


The extra term in the symplectic form of the reduced 
space is called a magnetic term because it has this 
interpretation in electromagnetism. To understand 
why B, is called a magnetic term, consider the 
problem of a particle of mass m and charge e 
moving in R? under the influence of a given 


magnetic field B=B,i+ Bj + B;k,divB—0. The 
Lorentz force law (written in the International 
System) gives the equations of motion 


me =evxB [1] 
where e is the charge and v=(x,y,z)=q is the 
velocity of the particle. What is the Hamiltonian 
description of these equations? 

There are two possible answers to this question. 
To formulate them, associate to the divergence free 
vector field B the closed 2-form B=B,dy ^ dz— 
Bydx ^ dz + B;dx ^ dy. Also, write B=curl A for 
some other vector field A—(A,,A,,A;) on RŽ, 
called the magnetic potential. 

Answer 1 Take on T*R? the symplectic form 
Qg — dx ^ dp. + dy ^ dp, + dz ^dp, —eB, where 
(Dx. Dy. D.) = p:=mv is the momentum of the 
particle, and h = m||v||! /2 = m(x? + y? + 22)/2 is the 
Hamiltonian, the kinetic energy of the particle. A 
direct verification shows that db = Op(X,, -), where 


0 ð ð ð 
Lcid die B, = BL) —— 
X, ai "iu UE A zy aU" 


"E. l ay 0 
+ e(B4£ — B,x) dp, + e(Byx = B,x) 77 
l i 


~ 


|2] 


which gives the equations of motion [1]. 

Answer 2 Take on T*R? the canonical symplec- 
tic form Q — dx ^ dp, + dy ^ dp, + dz ^ dp, and the 
Hamiltonian 54 = ||p — eA|| /2m. A direct verifica- 
tion shows that dha — O(X,,, -), where X,, has the 
same expression [2]. 

Next we show how the magnetic term in the 
symplectic form Gp is obtained by reduction from 
the Kaluza-Klein system. Let Q— R^ x S! with 
the circle G — S! acting on QO, only on the second 
factor. Identify the Lie algebra g of S! with R. Since 
the infinitesimal generator of this action defined 
by €€g=R has the expression £0(q, 0) = (q, 9; 0, £), 
if TS! is trivialized as S' xR, a momentum 
map J:T'O—R?xS! xRÓxR —g*=R is given by 
J(q.6; b,p)£— (p,p) (0,€) —p6, that IS, I (4,9; P,p)=p. 
In this case, the coadjoint action is trivial, so for any 
jw € g' =R, we have G, — $!, g, =R, and j/' = p. The 
1-form oj, — u(Axsdx + Aydy + A,dz + d0) e 0! (Q), 
where d0 denotes the length 1-form on St, is clearly 
G,, — Sl-invariant, has values in J(u) = {(4, 0; p, | 
q,p € R?^,0c€ S!j, and its exterior differential equals 
do,, = uB. Thus, the closed 2-form 5, on the base 
Q,—0O/G,— Q/S! - R? equals 4B and hence 
the magnetic term, that is, the closed 2-form 
B,—75,8, on T*Q,=T*R’, is also HB since 
10,:0 R^ x$!  Q/G, — R? is the projection. 
Therefore, the reduced space (T*Q), is 
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symplectically diffeomorphic to (T*R?, dx ^ dp, + 
dy ^ dpy+ dz ^ dp, — uB), which coincides with the 
phase space in Answer 1 if we put =e. This also 
gives the physical interpretation of the momentum 
map J:T*Q=R? x S! x R? xRog=R, J(q,6; 
p,p)=p and hence of the variable conjugate to 
the circle variable 0: p represents the charge. 
Moreover, the magnetic term in the symplectic 
form is, up to a charge factor, the magnetic field. 
The kinetic energy Hamiltonian 


1 1 
h(q,0; p,p) = pe lel ar ab” 


of the Kaluza—Klein metric, that is, the Riemannian 
metric obtained by keeping the standard metrics on 
each factor and declaring R? and S! orthogonal, 
induces the reduced Hamiltonian 


1 1 
h,.(q) = 2m lp + 3 


which, up to the constant 7/2, equals the kinetic 
energy Hamiltonian in Answer 1. Note that this 
reduced system is not the geodesic flow of the 
Euclidean metric because of the presence of the 
magnetic term in the symplectic form. However, 
the equations of motion of a charged particle in a 
magnetic field are obtained by reducing the geodesic 
flow of the Kaluza-Klein metric. 

A similar construction is carried out in Yang- 
Mills theory where A is a connection on a principal 
bundle and B is its curvature. Magnetic terms also 
appear in classical mechanics. For example, in 
rotating systems the Coriolis force (up to a dimen- 
sional factor) plays the role of the magnetic term. 


Reconstruction of Dynamics 
for Cotangent Bundles 


A general reconstruction method of the dynamics 
from the reduced dynamics was given in (see 
Symmetry and Symplectic Reduction). For cotangent 
bundles, using the mechanical connection, this 
method simplifies considerably. 

Start with the following general situation. Let G act 
freely on the configuration manifold OQ; let hb: T*O — 
R be a G-invariant Hamiltonian, 4€ g*, o; eJ (u), 
and c,,(t) the integral curve of the reduced system with 
initial condition [o] € (T*O),, given by the reduced 
Hamiltonian function 5, : (T* O),, — R. In terms of a 
connection A € Q! (J(u); g,,) on the left G -principal 
bundle J (jz) > (T*Q) „ the reconstruction procedure 
proceeds in four steps: 


e Step 1: Horizontally lift the curve c(t) € (T* O) 


to a curve d(t) e J ! (u) with d(0) = a4. 
e Step 2: Set &(t) = A(d(t))(X,(d(t))) € g,- 


H 
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e Step 3: With £(t) € g, determined in step 2, solve 
the nonautonomous differential equation g(t)= 
T;Lg5£(t) with initial condition g(0) =e, where Lg 
denotes left translation on G; this is the step that 
involves *quadratures" and is the main obstacle 
to finding explicit formulas. 

e Step 4: The curve c(t) — g(t) - d(t), with d(t) found 
in step 1 and g( found in step 3 is the integral 
curve of X, with initial condition c(0) = o. 


This method depends on the choice of the conne- 
ction AeQ!(J!(u); g,). Here are several particular 
cases when this procedure simplifies. 

(a) One-dimensional coadjoint isotropy group. If 
G,, =S' or G,=R, identify g, with R via the map 
a ER aÇ E g,, where (€ g,, C # 0, is a generator of 
g,. Then a connection 1-form on the S! (or R) 
principal bundle J! (uj) 一 (T* O),, is the 1-form A on 
J (u) given by A—(1/(u,C))0,, where 0, is the 
pullback of the canonical 1-form 0€ Q!(T*Q) to 
the submanifold J '(u) The curvature of this 
connection is the 2-form on (T*O), given by 
curv(A) = —(1/(u, Q))u,, where w, is the reduced 
symplectic form on (T*Q),. In this case, the curve 
E(t) € g, in step 2 is given by €(t) = A[h](d(t)), where 
Ac€X(T*O) is the Liouville vector field character- 
ized by the property of being the unique vector field 
on T*O that satisfies the relation d0(A,-) —0. In 
canonical coordinates (q',p;) on T*O, A = Pi gp, 

(b) Induced connection. Any connection A€ 
Q'(Q;g,,) on the left principal bundle O — O/G, 
induces a connection AeQ (J(u); g) by A(ag)x 
(Va) = A(q)(Ta,7Q(Va,)), where qEQ,a,€ T;O, 
Va, € Ta,(T*Q), and 29: T*O — O is the cotangent 
bundle projection. In this case, the curve £(f) € g, in 
step 2 is given by &(t)=.A(q(t))(Fh(d(t)), where 
q(t):=mo(d(t)) is the base integral curve and the 
vector bundle morphism F5: T* O — TO is the fiber 
derivative of þh given by 


h(ag + t84) 


d 
Pha) = l 


for any a, B, € T; O. Two particular instances of 
this situation are noteworthy. 


(b1) Assume that the Hamiltonian b is that of à 
simple mechanical system with symmetry. 
Choosing A to be the mechanical connection 
Amech, the curve £(t) € g, in step 2 is given by 
E(t) = A mech (q(£)) (((d(t), )))- 

(b2) If O —G is a Lie group, dim G, — 1, and ¢ is a 
generator of g,, then the connection A € Q'(G) 
can be chosen to equal .A(g):— (1/(u, Q)) 
T; Rz (u), where Ç is a generator of g, and Rg 
is right translation on G. 


(c) Reconstruction of dynamics for simple 
mechanical systems with symmetry. The case of 
simple mechanical systems with symmetry deserves 
special attention since several steps in the recon- 
struction method can be simplified. For simple 
mechanical systems, the knowledge of the base 
integral curve g(t) suffices to determine the entire 
integral curve on T*Q. Indeed, if h=K + Vos, is 
the Hamiltonian, the Legendre transformation 
Fb:T*Q- TO determines the Lagrangian system 
on TO given by tluq) =(1/2)|ugll” — Vu), for 
4,€ T,O. Lagrange’s equations are second-order 
and thus the evolution of the velocities is given by 
the time derivative g(t) of the base integral curve. 
Since Fb — (F£) !, the solution of the Hamiltonian 
system is given by F/(q(t). Using the explicit 
expression. of the mechanical connection and the 
notation given in the general procedure, the method 
of reconstruction simplifies to the following steps. 
To find the integral curve c(t) of the simple mecha- 
nical system with G-symmetry h=K + V oro on 
T*Q with initial condition c(0) — o, € T7O, know- 
ing the integral curve c,,(t) of the reduced Hamil- 
tonian system on (T*O), given by the reduced 
Hamiltonian function 5,:(T*'Q), — R with initial 
condition c,(0) — [o5] one proceeds in the follow- 
ing manner. Recall the symplectic embedding 
gu: (TQ), (09),) > (T'(Q/G,), o/c, — Bj). The 
curve q,(c,(t)) € T'(Q/G,) is an integral curve of 
the Hamiltonian system on (T*(O/G,), Qo;c, — By) 
given by the function that is the sum of the kinetic 
energy of the quotient Riemannian metric and the 
quotient amended potential V,. Let q,(t):= 
TO/G, (c, (£)) be the base integral curve of this system, 
where 70/6G,:1*'(O/G,) —^ O/G, is the cotangent 
bundle projection. 


e Step 1: Relative to the mechanical connection 
Amech € 9! (Q; g,), horizontally lift q,(1) € O/G,, 
to a curve q(t) € O passing through 4,(0) — q. 

e Step 2: Determine €(t) € g, from the algebraic system 
((E(t)o(qp(t)), no(qp(t)))) 34 (H, n) for all HE 9,5 
where ((-,-)) is the G-invariant kinetic energy 
Riemannian metric on QO. This implies that gq,(0) 
and €(0)o(q) are the horizontal and vertical compo- 
nents of the vector o7 € TO which is associated by 
the metric ((-,-)) to the initial condition ay. 

e Step 3: Solve g(t)=T.Lety€(t) in G, with initial 
condition g(0) =e. 

e Step 4: The curve q(t):— g(t)-g,(t), with q,(t) 
and g(£) determined in steps 2 and 4, respectively, 
is the base integral curve of the simple mechanical 
system with symmetry defined by the function h 
satisfying q(0) —0. The curve (F5) '(q(t)) e T*O 
is the integral curve of this system with initial 


condition c(0)—o,. In addition, 4'(t) —g(t): 
(qu (t) + €(t)¢(qp(t))) is the horizontal plus vertical 
decomposition relative to the connection induced 
on J ! (u) + (T* Q) „ by the mechanical connection 
Amech € Ql (O; g,)- 


There are several important situations when 
step 3, the main obstruction to an explicit solution 
of the reconstruction problem, can be carried out. 
We shall review some of them below. 


(cl) The case G,,=S'. If G, is abelian, the "wen in 
step 3 has the solution g(t) = exp f, £(s)ds. If, in 
addition, G,, —5!, then £(s) can be pain 
determined by step 2. Indeed, if GEg is a 
generator of g,, writing €(s)=a(s)¢ for some 
smooth real-valued function a defined on some 
open interval around the origin, the algebraic 
equation in step 2 implies that ((a(s)&(t)o(qp(£)), 
Co(qn(t))))=(Hs¢), which gives a(s)— (u, )/ 
\|Go(qp(s))||". Therefore, the base integral curve of 
the solution of the simple mechanical system with 
symmetry on T*O passing through q is 


: ds 
q(t) = op (00 | TUE p(t) 


and 


ON EEIN S: NS 
q(t) = p(s o) 人 ll&o (as s))Il" c) 


l VAS, 
x c 十 Tola clan) 


(c2) The case of compact Lie groups. An obvious 
situation when the differential equation in step 3 
can be solved is if €(t) =£ for all t, where £ is a 
given element of g,. Then the solution is 
g(t) — exp(té). However, step 2 puts certain 
restrictions under this hypothesis, because it 
requires that ((E(t)o(qp(t)),No(qn(t)))) = (47) 
for any 7 €g,. This is satisfied if there is a 
bilinear nondegenerate form (:,:) on g satisfy- 
ing (6,7) — ((Co(q),o(q))) for all qe O and 
C,n€g. This implies that (-,-) is positive 
definite and invariant under the adjoint action 
of G on g, so semisimple Lie algebras of 
noncompact type are excluded. If G is com- 
pact, which ensures the existence of a positive 
adjoint invariant inner product on g, and 
O =G, this condition implies that the kinetic 
energy metric is invariant under the adjoint 
action. There are examples in which such 
conditions are natural, such as in Kaluza- 
Klein theories. Thus, if G is a compact Lie 


Cotangent Bundle Reduction 663 


group and (-,-) is a positive-definite metric 
invariant under the adjoint action of G on g 
satisfying (6,7) = ((¢o(q),no(q))) for all qe O 
and 6,7) € g, then the element £(f) in step 2 can 
be chosen to be constant and is determined by 
the identity (£, -) 2 4|, on g,- The solution of 
the equation on step fi is then g(t) = exp(té). 

The case when &(t) is proportional to &(t). Try 
to find a real-valued function f(t) such that 
g(t) = exp(f(t)£(t)) is a solution of “a equation 
g(t) =TeLey€(t) with f(0)=0. This gives, for 
small 1, the equation f(£)£(t) + f(t)€(t) = E(t), 
that is, it is necessary that €(t) and £(£) be 
proportional. So, if £(t) —o(t)£(t) for some 
"wir smooth "rer a(t), then this gives 

) f exp( f; a(r)dr) ds. 

zi. case of Gu marin Write g(t) = exp(fi(£)£1) 
exp(fo(t)£a) ---exp(f,(t)£;), for some basis 
{£1, RENT £) of g, and some smooth real-valued 
functions f;, i= 1, 2,...,, defined around zero. It 
is known that if G, is solvable, the equation in 
step 3 can be solved by quadratures for the /;. 


(c3 


—á 


(c4 


x 


Reconstruction Phases for Simple Mechanical 
Systems with S! Symmetry 


Consider a simple mechanical system with symmetry 
G on the Riemannian manifold (QO,((:,:))) with 
G-invariant potential V € C*(O). If weg’, let V; 
be the amended potential and V, € C*(O/G,) the 
induced function on the base. Let c:[0, T] —^ T*O be 
an integral curve of the system with Hamiltonian 
h—K--Vomgo and suppose that its projection 
c, : [0, T] —5 (T* Q),, to the reduced space is a closed 
integral curve of the reduced system with Hamil- 
tonian b,. The reconstruction phase associated to 
the loop c, (t) is the group element g € G,, satisfying 
the identity c(T)—g-c(0). We shall present two 
explicit formulas of the reconstruction phase for the 
case when G, — 5'. Let ( € g, — R be a generator of 
the coadjoint isotropy algebra and write c(T)— 
exp(y¢) - c(0); in this case, y is identified with the 
reconstruction phase and, as we shall see in concrete 
mechanical sae it truly represents an angle. 

If Gp= the G,-principal bundle 7, :J '(u) > 
(T*Q) i = =X (u)/G, admits two natural connec- 
tions: A—(1/u6)0, € Q(J'(u)), where 6, is the 
eh wf the reenter 1-form on the cotangent 
bundle to the momentum level submanifold J (ji), 
and TO Amech € Q(T (u)). There is no reason to 
choose one connection over the other and thus there 
are two natural formulas for the reconstruction 
phase in this case. Let c,(t) be a periodic orbit of 
period T of the reduced system and denote also by 
b, the value of the Hamiltonian function on it. 
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Assume that D is a two-dimensional surface in 
(T*Q),, whose boundary is the loop c,(t). Since the 
aie ee (T* Q), and T*(Q/S!) are diffeomorphic 
(but not symplectomorphic), it makes sense to 
consider the base integral curve q,(t) obtained by 
projecting c,,(t) to the base Q/S!, which is a closed 
curve of period T. Denote by 


E 1 pia 
(x f Uad 

0 
the average of V, over the 2 A ). Let q,(t) cO 
be the Amech-horizontal lift of q,(t) to O and let x be 
the Aj, 44-holonomy of the loop q,,(t) measured from 
q(0), the base point of c(0); its expression is given by 
exp x = exp(-/ fp B), where B is the curvature of the 
mechanical connection. Denote by w, the reduced 
symplectic form on (T*Q),,. With these notations the 
phase w is given by 


ey] Ln 


sy — — 3 
^ "a ps 5 


The first terms in both formulas are the so-called 
geometric phases because they carry only geometric 
information given by the connection, whereas the 
second terms are called the dynamic phases since 
they encapsulate information directly linked to the 
Hamiltonian. The expression of the total phase as a 
sum of a geometric and a dynamic phase is not 
intrinsic and is connection dependent. It can even 
happen that one of these summands vanishes. We 
shall consider now two concrete examples: the free 
rigid body and the heavy top. 


~ 


© 2(b, — (V,))T 


Reconstruction Phases for the Free Rigid Body 


The motion of the free rigid body is a geodesic with 
respect to a left-invariant Riemannian metric on 
SO(3) given by the moment of inertia of the body. 
The phase space of the free rigid body motion is 
T*SO(3) and a momentum map J : T*SO(3) — R? of 
the lift of left translation to the cotangent bundle is 
given by right translation to the identity element. 
We have identified here so(3) with R? by the 
Lie algebra isomorphism x€ (RÌ, x )— x € (so(3), 
[-,-]), where £(y) 2x xy, and so(3)* with R^ by 
p inner product on R?. The reduced manifold 

! (u)/ G,, is identified with the sphere Si, , in R? of 
le. ul|l with the symplectic form ‘= = —dS/|llull, 
where dS is the standard area form on Si ull and G, = 
S! is the group of rotations around the axis u. These 
concentric spheres are the coadjoint orbits of the Lie- 
Poisson space $0(3)' and represent the level sets of the 


Casimir functions that are all smooth functions of 
|II^, where MER? denotes the body angular 
momentum. 

The Hamiltonian of the rigid body on the Lie- 
Poisson space T*SO(3)/SO(3) = R? is given by 


ify I. IDE 

2 (5 E ti a 

where I, l2, I3 >0 are the principal moments of 
inertia of the body. Let I := diag(11, I2, 73) denote the 
moment of inertia tensor diagonalized in a principal- 
axis body frame. The Lie-Poisson bracket on R? is 
given by {f g O)= -N -(Vf(I) x Vg(II)) and the 
equation of motions are II — II x Q, where Q € R? is 
the body angular. velocity given in terms of II by 
Q;:—II/I;, for i=1,2,3, that is, Q—I'II. The 
trajectories of the these equations are found by 
intersecting a family of homothetic energy ellipsoids 
with the angular momentum concentric spheres. If 
I; > h > I3, one immediately sees that all orbits are 
periodic with the exception of four centers (the two 
possible rotations about the long and the short 
moment of inertia axis of the body), two saddles 
(the two rotations about the middle moment of 
inertia axis of the body), and four heteroclinic orbits 
connecting the two saddles. 

Suppose that II(t) is a periodic orbit on the sphere 
Sj, with period T. After time T, by how much has 
the rigid body rotated in space? The answer to this 
question follows directly from [3]. Taking Ç = p/n 
and the potential v 三 0 we get 


2b,T 
lel 


T 2 |tri(s)|^ — (TI(s) - ITI(s))(tr 1) s 
4 ) - M(s))* 


3 
ria optas 


where D is one of the two spherical v» on Si i 
whose boundary is the periodic orbit I(t), h, is e 
value of the total energy on the solution Tie), and A 
is the oriented solid angle, that is, 


1 D 
Àr -u J ti, IA] = xi 
lul J Jp mi 


Reconstruction Phases for the Heavy Top 


b(II) := 


p=- 


The heavy top is a simple mechanical systems with 
symmetry S' on T*SO(3) whose Hamiltonian function 
is given by 5b(aj) :— (1/2) llo Il" + Mgfk - hx, where 
b €SO(3), a, € T;SO(3), k is the unit vector of the 
spatial Oz axis (pointing in the direction opposite to 


that of the gravity force), M € R is the total mass of the 
body, g € R is the value of the gravitational accelera- 
tion, the fixed point about which the body moves is the 
origin, and xy is the unit vector of the straight line 
segment of length / connecting the origin to the center 
of mass of the body. This Hamiltonian is left invariant 
under rotations about the spatial Oz axis. A momen- 
tum map induced by this S'-action is given by 
J: T*SO(3) > R, J( (o4) = —T;L,(oy):k; recall that 
T?L,(a,) =: E R? is the body angular momentum. 
The reduced space J (1)/S! is generically the cotan- 
gent bundle of the unit sphere endowed with the 
symplectic structure given by the sum of the canonical 
form plus a magnetic term; equivalently, this is the 
coadjoint orbit in the dual of the Euclidean Lie algebra 

Se(3) —R?xR? given by d II. I'm 
ITI =1}. The projection map J(u ) 5 O, imple- 
menting the symplectic diffeomorphism between the 
reduced space and the coadjoint orbit in se(3)' is 
given by oj — (IL D) :- (T2L,(ag), hk). The orbit 
symplectic form w, on QO, has the s Cops 
wu LTI x x +I xy,0 x x), (xe +r xy, 
Pk x)= -I:(xxx)-I-(xxy —x x y) for any 
x, x’, y, y! € R^. The heavy-top equations II — II x Q + 
MegtT x x, =T x Q are Lie-Poisson equations on 
ge(3)" for the Hamiltonian P(II, D) — (1/2)II - Q + 
Mg/T - x and the Lie-Poisson bracket {f, g)(IT, T) = 
JI - (Vnf x Vng) - T (Vnfx Vrg— Vng x Vrf), 
where Vn and Vr denote the partial gradients. 

Let (II(z), T(t)) be a periodic orbit of period T of 
the heavy-top equations. After time T, by how much 
has the heavy top rotated in space? The answer is 
provided by [3]: 


1 ' 1 T 
p= J | f. uy 十 : (an, — 2Mgt | | T (s) -+ às) 
— J | TN — TO) :TD y 
D (T(s) - IT (s))" 


T ds 
«f F(s) IPs) 


where D is the spherical cap on the unit sphere 
whose boundary is the closed curve I(t) and D is a 
two-dimensional submanifold of the orbit O, 
bounded by the closed integral curve (II(t), I (t)). 
The first terms in each summand represent the 
geometric phase and the second terms the dynamic 
phase. 


Gauged Poisson Structures 


If the Lie group G acts freely and properly on a 
smooth manifold O, then (T*O)/G is a quotient 
Poisson manifold (see Poisson Reduction), where the 
quotient is taken relative to the (left) lifted cotangent 
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action. The leaves of this Poisson manifold are the 
orbit reduced spaces J~'(O,,)/G, where O C g* is 
the coadjoint G-orbit through u€ g* (see Symmetry 
and Symplectic Reduction). Is there an explicit 
formula for this reduced Poisson bracket on a 
manifold diffeomorphic to (T*O)/G? It turns out 
that this question has two possible answers, once a 
connection on the principal bundle 7: Q — O/G is 
introduced. The discussion below will also link to 
the fibration version of cotangent bundle reduction. 

In order to present these answers, we review two 
bundle constructions. Let G act freely and properly 
on the manifold P and consider the a (left) principal 
G-bundle p:P—P/G:—M. Let 7T:N—M be a 
surjective submersion. Then the pullback bundle 
p:(n,p)eP:—-((njpeNxP|p(p)—r(n))—5^neN 
over N is also a principal (left) G-bundle relative to 
the action g : (n, p) :— (n,g - p). 

If there is a (left) G-action a manifold V, then the 
diagonal G-action g - (p,v) - (g- p,g- v) on x V is 
also free and proper and one can form the asso- 
ciated bundle P xg V:=(P x V)/G which is a 
locally trivial fiber bundle pg:[p,v| € E:- P xg 
V p(p) € M over M with fibers diffeomorphic to 
V. Analogously, one can form the associated fiber 
bundle p;:E:=P xg V—N. Summarizing, the 
associated bundle E—P xc; VN is obtained 
from the principal bundle p: P — M, the surjective 
submersion 7: N — M, and the G-mariifold V by 
pullback and association, in this order. 

These operations can be reversed. First, form the 
associated bundle pp:E=P xg VM and then 
pull it back by the surjective submersion 7: N —^ M 
to N to get the pullback bundle p; : E — N. The map 
®:P xg V—E defined by 9([(n,p),v]) :— (n, [p, v]) 
is an isomorphism of locally trivial fiber bundles. 

These general considerations will be used now to 
realize the quotient Poisson manifold (T*Q)/G in 
two different ways. Let O be a manifold pu G a Lie 
group (with Lie algebra g) acting freely and properly 
on it. Let A€Q!(Q;g) be a connection 1-form on 
the left G-principal bundle 7: O — O/G. Pull back 
the G-bundle 7: O — O/G by the cotangent bundle 
projection 79/65: T'(Q/G) ^ O/G to T*(Q/G) to 
obtain the G-principal bundle o/c : (o54, 4) € Q := 
(ot 4) | [4] =7(4), q € O] ^ og € T'(O/G). This 
bundle is isomorphic to the annihilator (VO)* C 
T*O of the vertical bundle VO:= ker Tx C TO. 
Next, form the coadjoint bundle ps:S:=Q xq 
g= T*(Q/G) of Q, Ps( (aiql q), Hu) 一 Qfql， that IS, 
the associated vector bundle to the G-principal 
bundle O — T*(Q/G) given by the coadjoint repres- 
entation of G on g*. The connection-dependent map 
@4:S—(T*Q)/G defined by ,([(Qjg),q), yl) :— 
[Tin ag) 十 it )"], where q€Q,o,€ I and 
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L€g',is a vector bundle isomorphism over Q/G. 
The Sternberg space is the Poisson manifold (S, {- , -}s), 
where {- , -]s is the pullback to S by ®, of the quotient 
Poisson bracket on (T* Q)/G. 

Next, we proceed in the opposite order. Construct 
first the coadjoint bundle p$:[q,p]€g' :— O xc 
g'—[gq|€ O/G associated to the principal bundle 
a: — O/G and then pull it back by the cotangent 
bundle projection amg/g:T*(Q/G)—Q/G to 
T*(Q/G) to obtain the vector bundle pw: W := 
(laa [qs £1) | royc(atg) = sie (Iq. 4]) =[4]}, pwlaja)s 
[9, u]) = oj over T*(Q/G). Note that W= T* 
(O/G) @ g* and hence W is also a vector bundle over 
Q/G. Let HO be the horizontal sub-bundle defined by 
the connection A; thus, TO — HO ® VQ, where 

H,Q:=ker A(q). For each q€ Q, the linear map 

Tang, o: e Tig Q/G) is an isomorphism. Let 
hor, := (T7; : Tja (Q/G) — HQ C TQ be 
the horizontal iR: operator induced by the connection 
A. Thus, hor? : T* ,Q > T (Q/G) is a linear surjective 
map whose ernel is the annihilator (H,Q)° of the 
horizontal space. The connection-dependent map 
V4i:(T'Q)/G— W defined by Wya([ag]):= = (hor, 
(aq), l4, J(aq)]), where q€ O,o, € T,O, and J: T 
Q — g* is the momentum map of the lifted action, 
(J(o4), £) = ag((£o(q)) for Eg, is a vector bundle 
isomorphism over Q/G and V4 o $4 = ^. The Wein- 
stein space is the Poisson manifold (W, (- , -}w), where 
{-,-}w is the push-forward by Y4 of the Poisson 
bracket of (T*Q)/G. In particular, 6:S — W is a 
connection independent Poisson diffeomorphism. The 
Poisson brackets on $ and on W are called gauged 
Poisson brackets. They are expressed explicitly in terms 
of various covariant derivatives induced on S and on 
W by the connection A € Q! (Q; g). 

Recall that the connection A on the principal 
bundle 7: Q — O/G naturally induces connections 
on pullback bundles and affine connections on 
associated vector bundles. Thus, both S and W 
carry covariant derivatives induced by A. They are 
given, according to general definitions, in the cases 
under consideration, by: 


e If / €C*( (S), s — [(a4q1» d), u]€$, and Vai Teas 
T'(Q/G), then d. f(s) € Taa T (Q/G) is defined 
by dif (s)(Yayy) — df (s (Tiu. iQ (Vau borg 
( Ta Tag )))» 0)) where TOxg ` :O x g —Qxg 
g'—S is the orbit map. The symbol d 4 signifies 
that this is a covariant derivative on the 
associated bundle S induced by the connection 
A on the principal G- pullback bundle 
Q  T'(Q/G). This connection A is the pullback 
connection defined by A. 

e Iff c C*(W),w = (oral, [q, 4]) € W, and v, € T, 


Oa] 


T*(O/G), then V, MTS €T* T*(Q/G) is defined 


iq] 


by We Fw) (Vey) = df(w) (vom, Ti, u)TOxer (hor, 
( LogTQ/G a), 0)) where TOxg = Q x g = 

O xc g' =g is the orbit map. The symbol Va 
signifies that this is a covariant derivative on the 
pullback bundle W induced by the covariant 
derivative V4 on the coadjoint bundle g*. This 
covariant derivative V4 is induced on g* by the 
connection A. 

è For f € C*(W), we have d; (f o 9) = VAM f)o 


To write the two gauged Poisson brackets on S and 
on W explicitly, we denote by g—O xcg the 
adjoint bundle of 7:0 — O/G, by Qoc the 
canonical symplectic structure on T*(Q/G), by 
BeQ?^(Q;g) the curvature of A, and by B the 
g-valued 2-form B € (07(O/G;g8) on the base O/G 
defined by B([q]) (ug), vig) = [45 B(q)(u4, v4)], for any 
UgyVgE Ta that satisfy — T,z(u;)— uj, and 
T4n(v4) — vjg. Note that both S* and W* are Lie 
algebra bundles, that is, their fibers are Lie algebras 
and the fiberwise Lie bracket operation depends 
smoothly on the base point. If f € C*(S), denote by 
df /6s € S* =O xg g the usual fiber derivative of f. 
Similarly, if f € C*(W) denote by óf /ów € W* the 
usual fiber derivative of f. Finally, 1: T* 
(T*(D/G)) + T(T*(Q/G)) is the vector bundle iso- 
morphism induced by fo0/c. The Poisson bracket of 
f,g€ C*(S) is given by 


{f,g}s(s) = Rocla) (df (S, dogs") 


df dg 
| (s H |) 
+ (P, Gre Be (df s d'a) 


where v—[q,u]€g'*. The Poisson bracket f,gc 


C**(W) is given by 
Ug) w(w) = €9o;c(oqg) (vifo. Vig) ) 
óf dg 
i (w, uL x) 
+ (v, (cB) Cong) (V fle), V4 gu) )) 


Note that their structure is of the form: *canonical" 
bracket plus a (left) *Lie-Poisson" bracket plus a 
curvature coupling term. 


The Symplectic Leaves of the Sternberg 
and Weinstein Spaces 


The = ya:Q x g' ^ T'O given by qaA((ajay, q), 
p) :— Tyagi) + A(q) p, where ((o55,4),4) EQ x g', 
is a ki -equivariant diffeomorphism; the G-action 
on T*Q is by cotangent lift and on Oxg' is 


g- ((@t 4) 1) — (oggi g  q), Adz 4). The pullback J, 
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of the momentum map to Ọ x g* has the expression 
Jallai q) )=p, so if OC g' is a coadjoint orbit we 
have J,'(O)=O x O, and hence the orbit reduced 
manifold J,'(O)/G, whose connected components 
are the symplectic leaves of S, equals O xg O. Its 
symplectic form is the Sternberg minimal coupling 
form Wo + psQo/c:- 

In this formula, the 2-form čp has not been 
defined yet. It is uniquely defined by the identity 

TOxg WO — dA + Towo, where wo is the minus orbit 
symplectic form on O (see Symmetry and Symplectic 
Reduction), Ilo: OQ x O— O i is the projection on the 
second factor. and AcQ?^(Q xO) is the 2-form 
given by Alla d) (llag v4), v) = 
-(u,A(q)v4)) for ((oqg,q), H) EQ x O, (us, v4) € 
Tias 2) Q; and v € g“. 

The symplectic leaves of the Weinstein space 
W are obtained by pushing forward by 4 the 
symplectic leaves of the Sternberg space. They are 
the connected components of = symplectic 
manifolds |(T*(O/G) 6 (Q xg ©), IT. i(o/o)€9/c + 
CIO xc oWO xc o^ ) where O is a condioint orbit in g*, 

Woyc is the canonical symplectic form on T*(Q/G), 
wDxuo İS a closed 2-form on Q xg O to be tae 

elow, and II g/g): T*(Q/G) 6 (Q xc O)— 
T*(OQ/G),Ilox;o: T'(Q/G) 6 (Q xc O) 2 Q xc O 
are the projections. The closed 2-form wo, 0 € 
7(O xc ©) is uniquely determined by the identity 
TO x oWO xc 0 7 "xe where roxo:Q x O-+Q xGgO 
is the orbit space projection, Wo x0 EN (O x0) is 
closed and given by wo,o(9,H)((4q —adep), 
(ty, 一 —ad, n) := rt —d(A X ido)(q, H) (us, —adep), (va, 
-ad, j)) + walt )(adej ad; u), and A x ido € Q (Ox 
a) ; given by (Ax ido)(q, i), ~ad} u) = 
(n, A(q)(14)), for q € Q, n € g", Ug, v; € T,O, 6, n € g. 

Thus, on the Sternberg and Weinstein spaces, 
both the Poisson bracket as well as the symplectic 
form on the leaves have explicit connection 
dependent formulas (see Gauge Theory: Mathema- 
tical Applications for a general treatment of gauge 
theories). 


See also: Gauge Theory: Mathematical Applications; 
Hamiltonian Group Actions; Poisson Reduction; 
Symmetries and Conservation Laws; Symmetry and 
Symplectic Reduction. 
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Introduction 


Sufficiently dense concentrations of mass-energy in 
general relativity collapse irreversibly and form black 
holes. More precisely, the singularity theorems state 
that once a closed trapped surface has developed, some 
world lines will only extend to a finite length in the 
future — they end in a spacetime singularity. Further- 
more, the cosmic censorship hypothesis states that this 
singularity is hidden away inside a black hole. One 
can, therefore, classify initial data in general relativity 
which describe an isolated system with no black hole 
present into those which remain regular, and those 
which form a black hole during their evolution. 

Theorems on the stability of Minkowski spacetime, 
and similar results for some types of matter coupled to 
gravity, imply that sufficiently weak (in some technical 
sense) initial data will remain regular. On the other 
hand, no necessary or sufficient criterion for black hole 
formation is known. For very strong data the existence 
of a closed trapped surface implies black hole 
formation, but although the data themselves may be 
regular, the trapped surface must already be inside the 
black hole. Between the very weak and very strong 
regime, there is a middle regime of initial data for 
which one cannot decide if they will or will not form a 
black hole, other than evolving them in time. 

The threshold between collapse and dispersion was 
first explored systematically by Choptuik (1992). He 
concentrated on the simple model of a spherically 
symmetric massless scalar (matter) field ó(r, t). In this 
model, the scalar-field matter must either form a black 
hole, or disperse to infinity — it cannot form stable 
stars. Choptuik explored the space of initial data by 
means of one-parameter families of initial data which 
interpolate between strong data (say with large 
parameter p) that form a black hole and weak data 
(with small p) that disperse. The critical value p. of the 
parameter p can be found for each family by evolving 
many data sets from that family. Near the black hole 
threshold, Choptuik found the following phenomena: 


1. Mass scaling. By fine-tuning the initial data to 
the threshold along any one-parameter family, 
one can make arbitrarily small black holes. Near 
the threshold, the black hole mass scales as 


M~C(p—p.)’ for p= ps [1] 
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for the black hole mass M in the limit p — p. 
from above. 

2. Universality. While p, and C depend on the 
particular one-parameter family of data, the critical 
exponent ^ has a universal value, y 0.374, for all 
one-parameter families of scalar-field data. Further- 
more, for a finite time in a finite region of space, the 
solutions generated by all near-critical data 
approach one and the same solution ¢,, called the 
critical solution: 


y t=, 

or, 2) e e (To) [2] 
The constants t, and L depend again on the 
family of initial data, but ģ,(r,t) is universal. This 
universal phase ends when the evolution decides 
between black hole formation and dispersion. 
The universal critical solution is approached by 
any initial data that are sufficiently close to the 
black hole threshold, on either side, and from any 
one-parameter family. 

3. Scale-echoing. The critical solution @,(r,t) is 
unchanged when one rescales space and time by 
a factor e^: 


o. (r, t) = (er, e^t) [3] 
where A ~ 3.44 for the scalar field. 


The same phenomena were quickly discovered in 
many other types of matter coupled to gravity, and 
even in vacuum gravity (where gravitational waves can 
form black holes). The echoing period A and critical 
exponent y depend on the type of matter, but the 
existence of the phenomena appears to be generic. For 
some types of matter (e.g., perfect fluid matter), the 
critical solution is continuously scale invariant (or 
continuously self-similar, CSS) in the sense that 


$.(r, t) = o. (r/t) [4] 


rather than scale-periodic (or discretely self-similar, 
DSS) as in [3]. (We use the notation ¢,(x) for the 
function of one variable r/t.) We have described 
scale invariance and scale-echoing here in terms 
of coordinates, but these do admit geometric, 
coordinate-invariant definitions, which are not 
restricted to spherical symmetry. 

There is also another kind of critical behavior at the 
black hole threshold. Here, too, the evolution goes 
through a universal critical solution, but it is static, 
rather than scale invariant. As a consequence, the mass 
of black holes near the threshold takes a universal 
finite value (some fixed fraction of the mass of the 
critical solution), instead of showing power-law 


scaling. In an analogy with first- and second-order 
phase transitions in statistical mechanics, the critical 
phenomena with a finite mass at the black hole 
threshold are called type I, and the critical phenomena 
with power-law scaling of the mass are called type II. 

At this point, we characterize the degree of rigor 
of the various parts of the theory that is summarized 
in this article. Critical phenomena were discovered 
in the numerical time evolution of generic asympto- 
tically flat initial data. Numerical evolution of many 
elements of a specific one-parameter family, and 
fine-tuning to the black hole threshold along that 
family showed self-similarity and mass scaling near 
the threshold. Doing this for a number of randomly 
chosen one-parameter families suggests that these 
phenomena, and in particular the echoing scale A 
and mass-scaling exponent y, are universal between 
initial data within one model (e.g., the spherical 
scalar field). Numerical experiments, however, can 
only explore a finite-dimensional subspace of the 
infinite-dimensional space of initial data (phase 
space) of the field theory, and so cannot prove 
universality. 

We go further by applying the theory of dynami- 
cal systems to general relativity. The arguments 
summarized in the next section would be difficult to 
make rigorous, as the dynamical system under 
consideration is infinite dimensional, but they 
suggest a focus on fixed points of the dynamical 
system and their linear perturbations. Even though 
the dynamical systems motivation is not mathema- 
tically rigorous, the linearized analysis itself is a 
well-defined problem that can be solved numerically 
to essentially arbitrary precision. This proves uni- 
versality on a perturbative level, and provides 
numerical values of A and y. A combination of the 
global dynamical systems analysis and perturbative 
analysis even predicts further critical exponents for 
black hole charge and angular momentum. Finally, 
critical phenomena have been discovered in a 
number of systems (different types of matter and 
symmetry restrictions), and this suggests that they 
may be generic for some large class of field theories 
(although details such as the numerical values of 
y and A do depend on the system), but there is no 
conclusive evidence for this at present. 


The Dynamical Systems Picture 


When we consider general relativity as an infinite- 
dimensional dynamical system, a solution curve is a 
spacetime. Points along the curve are Cauchy 
surfaces in the spacetime, which can be thought of 
as moments of time. An important difference 
between general relativity and other field theories 
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is that the same spacetime can be sliced in many 
different ways, none of which is preferred. There- 
fore, to turn general relativity into a dynamical 
system, one has to fix a slicing (and in practice also 
coordinates on each slice). In the example of the 
spherically symmetric massless scalar field, using 
polar slicing and an area radial coordinate r, a point 
in phase space can be characterized by the two 
functions 


z= (op. reo) 5 


In spherical symmetry, there are no degrees of 
freedom in the scalar field, and Cauchy data for 
the metric can be reconstructed from Z using the 
Einstein constraints. 

The phase space consists of two halves: initial 
data whose time evolution always remains regular, 
and data which contain a black hole or form one 
during time evolution. The numerical evidence 
collected from individual one-parameter families of 
data suggests that the black hole threshold that 
separates the two is a smooth hypersurface. The 
mass-scaling law [1] can, therefore, be restated 
without explicit reference to one-parameter families. 
Let P be any function on phase space such that data 
sets with P > 0 form black holes, and data with P < 0 
do not, and which is analytic in a neighborhood of 
the black hole threshold P — 0. The black hole mass 
as a function on phase space is then given by 


M = F(P) P? [6] 


for P > 0, where F(P) > 0 is an analytic function. 

Consider now the time evolution in this dynami- 
cal system, near the threshold (“critical surface") 
between black hole formation and dispersion. A 
phase-space trajectory that starts out in a critical 
surface by definition never leaves it. A critical 
surface is, therefore, a dynamical system in its own 
right, with one dimension fewer. If it has an 
attracting fixed point, such a point is called a 
critical point. It is an attractor of codimension 1, 
and the critical surface is its basin of attraction. The 
fact that the critical solution is an attractor of 
codimension 1 is visible in its linear perturbations: it 
has an infinite number of decaying perturbation 
modes tangential to (and spanning) the critical 
surface, and a single growing mode not tangential 
to the critical surface. 

Any trajectory beginning near the critical surface, 
but not necessarily near the critical point, moves 
almost parallel to the critical surface toward the 
critical point. As the phase point approaches the 
critical point, its movement parallel to the surface 
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Figure 1 The phase-space picture for the black hole threshold 
in the presence of a critical point. The arrow lines are time 
evolutions, corresponding to spacetimes. The line without an 
arrow is not a time evolution, but a one-parameter family of initial 
data that crosses the black hole threshold at p = p.. (Reproduced 
with permission from Gundlach C (2003) Critical phenomena in 
gravitational collapse. Physics Reports 376: 339—405.) 


slows down, while its distance and velocity out of 
the critical surface are still small. The phase point 
spends sometime moving slowly near the critical 
point. Eventually, it moves away from the critical 
point in the direction of the growing mode, and ends 
up on an attracting fixed point. 

This is the origin of universality: any initial data 
set that is close to the black hole threshold (on either 
side) evolves to a spacetime that approximates the 
critical spacetime for sometime. When it finally 
approaches either the dispersion fixed point or the 
black hole fixed point, it does so on a trajectory that 
appears to be coming from the critical point itself. 
All near-critical solutions are passing through one of 
these two funnels. All details of the initial data have 
been forgotten, except for the distance from the 
black hole threshold: the closer the initial phase 
point is to the critical surface, the more the solution 
curve approaches the critical point, and the longer it 
will remain close to it. : 

In all systems that have been examined, the black 
hole threshold contains at least one critical point. A 
fixed point of the dynamical system represents a 
spacetime with an additional continuous symmetry 
that generic solutions do not have. If the critical 
spacetime is time independent in the usual sense, we 
have type I critical phenomena; if the symmetry is 
scale invariance, we have type II critical phenomena. 
The attractor within the critical surface may also be 
a limit cycle, rather than a fixed point. In spacetime 


terms this corresponds to a discrete symmetry (DSS 
rather than CSS in type Il, or a pulsating critical 
solution, rather than a stationary one, in type I). 


Self-Similarity and Mass Scaling 


Type II critical phenomena occur where the critical 
solution is scale invariant (self-similar, CSS or DSS). 
Using suitable spacetime coordinates, a CSS solution 
can be characterized as independent of a time 
coordinate 7 which is also a logarithmic scale. 
Similarly, a DSS solution can be characterized as 
periodic in 7. For example, starting from the scale 
periodicity [3] in polar-radial coordinates, we 
replace r and t by new coordinates 


r b= by 
T ian In( L u 
where the accumulation time £, and scale L must be 
matched to the one-parameter family under con- 
sideration. 7 has been defined so that it increases as 
t increases and approaches t, from below. It is useful 
to think of r, t, and L as having dimension length in 
units c— G— 1l, and of x and 7 as dimensionless. 
Choptuik's observation, expressed in these coordi- 
nates, is that in any near-critical solution there is 
a spacetime region where the fields Z are well 
approximated by the critical solution, or 


Zi; T) Bale, T) [8] 


with 
Zl; THA) = Ze T) [9] 


Note that the time parameter of the dynamical 
system must be chosen as 7 if a CSS solution is to be 
a fixed point, or a DSS solution a cycle. More 
generally (going beyond spherical symmetry), on any 
self-similar spacetime one can introduce coordinates 


x" — (r,x!,x?, x?) in which the metric is of the form 


Au = e E. [1 0] 


and where g,, is independent of 7 for a CSS 
spacetime, and periodic in 7 for a DSS spacetime. 
These coordinates are not unique. 

The critical exponent ^ can be calculated from the 
linear perturbations of the critical solution. In order 
to keep the notation simple, the discussion will be 
restricted to a critical solution that is spherically 
symmetric and CSS, which is correct, for example, 
for perfect-fluid matter. 

Let us assume that we have fine-tuned initial data 
close to the black hole threshold so that in a region 
the resulting spacetime is well approximated by the 
CSS. critical solution. This part of the spacetime 


corresponds to the section of the phase-space 
trajectory that lingers near the critical point. In this 
region, we can linearize around Z,. As Z, does not 
depend on 7, its linear perturbations can depend 
on T only exponentially. Labeling the perturbation 
modes by i, a single mode perturbation is of 
the form 


6Z = Cie Zi(x) [11] 


In the near-critical regime, we can therefore 


approximate the solution as 


DO 


Z(x, 7) ~ Ze(x)+ Gez [12] 
i=0 


The notation C;(p) is used because the perturbation 
amplitudes C; depend on the initial data, and hence 
on the parameter p that controls the initial data. 

If Z, is a critical solution, by definition there is 
exactly one Xi with positive real part (in fact, it is 
purely real), say Ag. As t— t, from below, which 
corresponds to 7 — oc, all other perturbations decay 
and can be neglected. By definition, the critical 
solution corresponds to p — p,, and so we must have 
Co(p.) — 0. Linearizing around p,, we obtain 

Z(x, T) = Z(x) -——| (p-p.)e" Zo(x) [13] 
p. 


in a region of the spacetime. 
Now we extract Cauchy data at one particular 
value of 7 within that region, namely at 7; 


defined by 
dCo 


= —A0Tp 一 
in ee Se [14] 


p. 


where € is an arbitrary small constant, so that 
Z(x,15) = Z,(x) + € Zo(x) [15] 


where + is the sign of p — p+, left behind because by 
definition e is positive. As 7 increases from 7,, the 
growing perturbation becomes nonlinear and the 
approximation [13] breaks down. Then either a 
black hole forms (say for the positive sign), or the 
solution disperses (for the negative sign). We need 
not follow this nonlinear evolution in detail to find 
the black hole mass scaling in the former case: 
dimensional analysis is sufficient. Going back to 
coordinates £ and r, we have 


Z(r, ty) ~ Z; (5) db Zo (5) [16] 


where 
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These Cauchy data at t=t, depend on the initial 
data at t=0 only through the overall scale Lp, and 
through the sign in front of e. If the field equations 
themselves are scale invariant, or asymptotically 
scale invariant at scales L and smaller, the black 
hole mass, which has dimensions of length in 
gravitational units, must be proportional to the 
initial data scale Lp, the only length scale that is 
present. Therefore, 


M «x Ly « (p — p,)'/* [18] 


and we have found the critical exponent to be y = 1/29. 


The Analogy with Statistical Mechanics 


The existence of a threshold where a qualitative 
change takes place, universality, scale invariance, 
and critical exponents suggest that there is a 
mathematical analogy between type II critical 
phenomena and critical phase transitions in statis- 
tical mechanics. 

In equilibrium statistical mechanics, observable 
macroscopic quantities, such as the magnetization of 
a ferromagnetic material, are derived as statistical 
averages over microstates of the system. The 
expected value of an observable is 


(A) — * A(microstate) e "microstates) — [19] 


microstates 


The Hamiltonian H depends on the parameters p, 
which comprise the temperature, parameters char- 
acterizing the system such as interaction energies of 
the constituent molecules, and macroscopic forces 
such as the external magnetic field. The objective of 
statistical mechanics is to derive relations between 
the macroscopic quantities A and parameters p. 

Phase transitions in thermodynamics are thresholds 
in the space of external forces j at which the 
macroscopic observables A, or one of their derivatives, 
change discontinuously. In a ferromagnetic material 
at high temperatures, the magnetization m of the 
material (alignment of atomic spins) is determined by 
the external magnetic field B. At low temperatures, the 
material shows a spontaneous magnetization even at 
zero external field, which breaks rotational symmetry. 
With increasing temperature, the spontaneous magne- 
tization m decreases and vanishes at the Curie 
temperature T, as 


m| ~ (T, — T? [20] 


In the presence of a very weak external field, the 
spontaneous magnetization aligns itself with the 
external field B, while its strength is, to leading 
order, independent of B. The function m(B, T), 
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therefore, changes discontinuously at B — 0. The line 
B=0 for T < T, is, therefore, a line of first-order 
phase transitions between the possible directions of 
the spontaneous magnetization (in a one-dimen- 
sional system, between m up and m down). This line 
ends at the critical point (B— 0, T — T,) where the 
order parameter |m| vanishes. The role of B=0 as 
the critical value of B is obscured by the fact that 
B — 0 is singled out by symmetry. 

A critical phase transition involves scale-invariant 
physics. One sign of this is that fluctuations appear 
on a large range of length scales between the 
underlying atomic scale and the scale of the sample. 
In particular, the atomic scale, and any dimensionful 
parameters associated with that scale, must become 
irrelevant at the critical point. This can be taken as 
the starting point for obtaining properties of the 
system at the critical point. 

One first defines a semigroup acting on micro- 
states: the renormalization group. Its action is to 
group together a small number of particles as a 
single particle of a fictitious new system, using some 
averaging procedure. Alternatively, this can also be 
done in Fourier space. One then defines a dual 
action of the renormalization group on the space of 
Hamiltonians by demanding that the partition 
function is invariant under the renormalization 
group action: 


jJ» E = >, eH [21] 


microstates microstates' 


The renormalized Hamiltonian H' is in general 
more complicated than the original one, but it can 
be approximated by a fixed expression where only 
a finite number of parameters p are adjusted. Fixed 
points of the renormalization group correspond to 
Hamiltonians with the parameters yu at their critical 
values. The critical value of any dimensional 
parameter j; must be zero (or infinity). Only 
dimensionless combinations can have nontrivial 
critical values. 

The behavior of thermodynamical quantities at 
the critical point is in general not trivial to calculate. 
But the action of the renormalization group on 
length scales is given by its definition. The blowup 
of the correlation length € at the critical point is, 
therefore, the easiest critical exponent to calculate. 
We make contact with critical phenomena in 
gravitational collapse by considering the time evolu- 
tion in coordinates (7,x) as a renormalization group 
action. The calculation of the critical exponent for 
the black hole mass M is the precise analog of the 
calculation of the critical exponent for the correla- 
tion length é, substituting T, — T for p —p., and 


taking into account that the 7-evolution in critical 
collapse is toward smaller scales, while the renor- 
malization group flow goes toward larger scales: 
therefore, € diverges at the critical point, while M 
vanishes. 

We have shown above that the black hole mass is 
controlled by one global function P on phase space. 
Clearly, P is the gravity equivalent of T — T, in 
the ferromagnet. But it is tempting to speculate 
(Gundlach 2002)that there is also a gravity equiva- 
lent of the external magnetic field B, which gives rise 
to a second independent critical exponent. At least 
in some situations, the angular momentum of the 
initial data can play this role. Note that, like B, 
angular momentum is a vector, with a critical value 
that is zero because all other values break rotational 
symmetry. Furthermore, the final black hole can 
have nonvanishing angular momentum, which must 
depend on the angular momentum of the initial 
data. The former is analogous to the magnetization 
m, the latter to the external field B. It can be shown 
that this analogy holds perturbatively for small 
angular momentum. Future numerical simulations 
will show if it goes further. 


Universality and Cosmic Censorship 


Critical phenomena in gravitational collapse first 
generated interest because a complicated self-similar 
structure and dimensionless numbers y and A arise 
from generic initial data evolved by quite simple 
field equations. Another point of interest is the 
rather detailed analogy of phenomena in a determi- 
nistic field theory with critical phase transitions in 
statistical mechanics. But critical phenomena are 
important for general relativity mostly for a differ- 
ent reason. 

Black holes are among the most important 
solutions of general relativity because of their 
universality: the black hole uniqueness theorems 
state that stable black holes are completely deter- 
mined by their mass, angular momentum, and 
electric charge — the Kerr-Newman family of black 
holes. Perturbation theory shows that any perturba- 
tions of black holes from the Kerr-Newman solu- 
tions must be radiated away. 

Critical solutions have a similar importance 
because they are generic intermediate states of 
the evolution that are also independent of the 
initial data. An important distinction is that 
critical solutions depend on the matter model, 
and are therefore less universal than black holes. 
However, critical phenomena in gravitational 
collapse seem to arise in axisymmetric vacuum 
spacetimes, and so are apparently not linked to the 


presence of matter. Furthermore, they also arise in 
perfect-fluid matter with the equation of state 
p=p/3, which is that of an ultrarelativistic gas. 
This is a good approximation for matter at very 
high density, such as in the big bang. This is 
important because critical phenomena probe 
arbitrarily large matter densities or spacetime 
curvatures as the initial data are fine-tuned to the 
black hole threshold. At even higher densities, 
presumably on the Planck scale, scale invariance is 
again broken by quantum-gravity effects, and 
so critical phenomena will end there. 

The cosmic censorship conjecture states that 
naked singularities do not arise from suitably 
generic initial data for suitably well-behaved mat- 
ter. Critical phenomena in gravitational collapse 
have forced a tightening of this conjecture. Type II 
(self-similar) critical solutions contain a naked 
singularity, that is, a point of infinite spacetime 
curvature from which information can reach a 
distant observer. (By contrast, the singularity inside 
a black hole is hidden from distant observers.) On a 
kinematical level, this could be seen already from 
the form [10] of the metric. Because the critical 
solution is the end state for all initial data that are 
exactly on the black hole threshold, all initial data 
on the black hole threshold form a naked singular- 
ity. As type II critical phenomena appear to be 
generic at least in spherical symmetry, this means 
that in generic self-gravitating systems, the space of 
regular initial data that form naked singularities is 
larger than expected, namely of codimension 1. 
Excluding naked singularities from generic initial 
data may be the sharpest version of cosmic censor- 
ship one can now hope to prove. 

Another point of interest in critical collapse is that 
it allows one to make a small region of arbitrarily 
high curvature from finite-curvature initial data. 
This may be a route for probing quantum-gravity 
effects. Similarly, one can make black holes that are 
much smaller than any length scale present in the 
initial data or the matter equation of state. An 
application has been suggested for this in cosmol- 
ogy, where primordial black holes could have 
masses much smaller than the Hubble scale at 
which they are created, rather than of the order of 
this scale. 
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Outlook 


Critical phenomena in gravitational collapse are now 
well understood in spherical symmetry, both theoreti- 
cally and in numerical simulations. In some matter 
models, the phenomenology is quite complicated, but 
it still fits into the basic picture outlined here. 

The crucial question as to what happens beyond 
spherical symmetry remains largely unanswered at 
the time of writing. Perturbation theory around 
spherical symmetry suggests that critical phenom- 
ena are not restricted to exactly spherical situa- 
tions. This is also supported by simulations in 
axisymmetric (highly nonspherical) vacuum grav- 
ity. Other simulations of nonspherical gravitational 
collapse which cover the necessary range of space- 
time scales required to see critical phenomena are 
only just becoming available, and the results are 
not yet clear-cut. For collapse with angular 
momentum, no high-resolution calculations have 
yet been carried out. As the necessary techniques 
become available, one should be prepared for 
numerical simulations to make dramatic extensions 
or corrections to the picture of critical collapse 
drawn up here. 


See also: Computational Methods in General Relativity: 
The Theory; Spacetime Topology, Causal Structure and 
Singularities; Stability of Minkowski Space; Stationary 
Black Holes. 
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Introduction 


Certain commutation relations among the current 
density operators in quantum field theories define 
an infinite-dimensional Lie algebra. The original 
current algebra of Gell-Mann described weak and 
electromagnetic currents of the strongly interacting 
particles (hadrons), leading to the Adler-Weisberger 
formula and other important physical results. This 
helped inspire mathematical and quantum-theoretic 
developments such as the Sugawara model, light 
cone currents, Virasoro algebra, the mathematical 
theory of affine Kac-Moody algebras, and non- 
relativistic current algebra in quantum and statis- 
tical physics. Lie algebras of local currents may be 
the infinitesimal representations of loop groups, 
local current groups or gauge groups, diffeomorph- 
ism groups, and their semidirect products or other 
extensions. Broadly construed, current algebra thus 
leads directly into the representation theory of 
infinite-dimensional groups and algebras. Applica- 
tions have ranged across conformally invariant 
field theory, vertex operator algebras, exactly 
solvable lattice and continuum models in statistical 
physics, exotic particle statistics and q-commuta- 
tion relations, hydrodynamics and quantized vortex 
motion. This brief survey describes but a few 


highlights. 


Relativistic Local Current Algebra 
for Hadrons 


To model superfluidity, Landau had proposed in 
1941 a quantum hydrodynamics fundamentally 
based on local fluid densities and currents as 
(operator) dynamical variables. However, current 
algebra came into its own in theoretical physics with 
the ideas of Gell-Mann in the early 1960s. The basic 
concept, in the era just preceding quantum chromo- 
dynamics (QCD), was that even without knowing 
the Lagrangian governing hadron dynamics in 
detail, exact kinematical information — the local 
symmetry — could still be encoded in an algebra of 
currents. The local (vector and axial vector) current 
density operators, expressed where possible in terms 
of underlying quantized field operators in Hilbert 
space, were to form two octets of Lorentz 4-vectors, 
with each octet corresponding to the eight genera- 
tors of the compact Lie group SU(3). 


More specifically (Adler and Dashen 1968), let 
F5(x)241,2,...,8, 2 =0,1,2,3, be an octet of 
hadronic vector currents, where as usual 
x= (x") = (x?, x) denotes a point in four-dimensional 
spacetime. Likewise, introduce an axial vector octet 
F°"(x). Unless otherwise specified, we use natural 
units, where h= 1 and c — 1. Define the correspond- 
ing charges F, and F? to be the space integrals of the 
time components of these currents, that is, 


= / dx 9 (x, x) 


[1] 
Fix?) = [as F (x9, x) 


where d°x=dx! dx? dx?. Then F,,F2,F3 are the 

three components 11,12,13 of the isotopic spin, and 
Y — (2/3/3)Fg is the i n Rea The usual elec- 

tromagnetic current J£ (x?, x) is given by 


scare) 2 


where q is the unit cementar s and the total 
charge is given by Q= f d?xJ9. (x9, x) — q(I3 + Y/2). 
The hadronic part of the i. current entering an 
effective Lagrangian can be written as 


ln 


a ^) +i( F$- F3 ) | cos bc 
+ Kr - F3) +i( Ft- F3) | sin (i 


where pc is the Cabibbo angle (determined experi- 
mentally to be ~0.27 rad). The terms with 7 — FÌ 
and Fz — F3 are strangeness conserving, those with 
F — F} and Fs — F} are not. 

The main current algebra hypothesis is that the 
" F? 50 à 
time components fF and F>” of these octets satisfy 
the equal-time commutation relations: 


[72 (x9, x), FE, )] a 
O) xc — y) V Caaf q(x, x) 
d 


[Fa(x°,x), FE OD) e 


= 16°) (x — y) ». Caba. 2 (X. X) 4) 
d 


504.0 50 y, 0 
HS (x X). Fp (y Y) 
= i69 (x — y) V. cana F(x, x) 
d 
where the Caba are structure constants of the Lie 


algebra of SU(3), antisymmetric in the indices. Since 
current commutators relate bilinear expressions to 


linear ones, they fix the normalizations of the 
currents. The chiral currents Ft” = (1/2)(7" — 75") 
and J£ —(1/2)(£" + F) commute with each 
other, so that the local current algebra decomposes 
into two independent pieces. 

The Dirac ó-functions in eqns [4] require that F? and 
F? be interpreted as (unbounded) operator-valued 
distributions; while the fixed-time condition suggests 
these should make mathematical sense as 
three-dimensional distributions, with x? held constant. 
Such distributions may be modeled on the test-function 
space D of real-valued, compactly supported, C% 
functions on the spacelike hyperplane R?. For functions 
faf; € D, one has formally the “smeared currents" 
that are expected to be bona fide (unbounded) 
operators in Hilbert space; suppressing x?, 


£f = Í. d3xf, (x) FO (x^, x) 


[5] 
F> (Ff: )- 人 d?x f? (x )F (x9 x) 
Equations [4] then become 
[Falfa), Fe(fo)] — [2 (fa), Fo (f2)] 
=i) | Fi (Cabdfats) 6 
d 


[Fo (fe), Fe (fu)] = FY (cabafats) 
d 


Let g(x) be a C* map from R? to the Lie algebra G of 
chiral SU(3) x SU(3), equal to zero outside a compact 
set. The set of all such G-valued functions forms an 
infinite-dimensional Lie algebra under the pointwise 
bracket, [g, g (x)= [g(x), g/(x)]. Let us call this Lie 
algebra maps(R?, G), where the subscript 0 indicates 
the condition of compact support when that is 
applicable (on compact manifolds, we omit the sub- 
script). Expanding g(x) with respect to a fixed basis of 
G, we straightforwardly identify the map g with the 
two octets of test functions f; and f°. Then, defining 
Fig)= $5.9 (fa) + doa £39(f5) eqns [6] are inter- 
preted (for fixed x °) as a representation F of 
map,(R°, G). 

Integrating out the spatial variables entirely using 
eqns [1] leads to a representation at x” of G by the 
charges F, and F?. The Adler-Weisberger sum rule 
was first derived (in 1965) from the commutation 
relations of these charges, together with the assump- 
tion of a partially conserved axial-vector current 
(PCAC). It connected nucleon 9-decay coupling with 
pion-nucleon scattering cross sections, agreeing well 
with experiment. Various low-energy theorems 
followed, also in accord with experiment. Shortly 
thereafter, Adler was able to eliminate the PCAC 
assumption, and derived a further sum rule going 
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beyond an experimental test of the algebra of 
charges to test the actual local current algebra. 
Here, the prediction pertained to structure functions 
in the deep inelastic scattering of neutrinos. This 
was elaborated by Bjorken to inelastic electron 
scattering. On the theoretical side, the study of the 
chiral current in perturbation theory led into the 
theory of anomalies. All these ideas were highly 
influential in subsequent theoretical work (Treiman 
et al. 1985, Mickelsson 1989). 

' It is a natural idea to try to extend eqns [4] or [6], 
which elegantly express the combined ideas of 
locality and symmetry, to an equal-time commutator 
algebra that would also include the space compo- 
nents of the local currents Ft k= 1,2,3. One may 
write without difficulty the commutators of the 
charges in fs with these space components: 


[Fa (x9), FE(x°, x)] = [F2 (x9), FR" (x°, x)] 
= Sect x”, x) 
d 
[Falx”), Fe", x)] = [E3(x9), FE" oc) 
= $F Gif (x9, x) 
d 


[7] 


But the commutator of the local time component 
with the local space component of the current 
cannot be merely the obvious extrapolation from 
eqns [4] and [7], that is, it cannot be 


Fa(x°, x), FEO, y). 
=i) (x — y) 5 capa F ae’, x) 
d 


and so forth. Under very general conditions, for a 
relativistic theory based on local quantum fields or 
local observables, additional *Schwinger terms" are 
required on the right-hand sides of such commu- 
tators (Renner 1968). 

Well-known difficulties in specifying the Schwinger 
terms are associated with the fact that operator- 
valued distributions are singular when regarded as if 
they were functions of spacetime points. Thus, the 
product of two distributions at a point is often 
singular or undefined. When the currents forming a 
local current algebra are written as normal-ordered 
products of field operator distributions and their 
derivatives, the Schwinger terms in their commuta- 
tion relations may be calculated, for example, by 
"splitting points" in the arguments of the underlying 
fields, and subsequently letting the separation tend 
toward zero. The general form of a Schwinger term 
typically involves the derivative of a ó-function times 
an operator. This may be a multiple of the identity 
(i.e., a c-number) or not, depending on the underlying 
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field-theoretic model. Furthermore, when the number 
of spacetime dimensions is greater than 1 + 1, the 
c-number Schwinger terms turn out to be infinite. 
Hence, we do not obtain this way a bona- fide 
infinite-dimensional, equal-time commutator algebra 
comprising all the components of the local currents. 


Sugawara, Kac-Moody, and 
Virasoro Algebras 


Since equations such as [4] and [6] are not explicitly 
dependent on how the currents are constructed from 
underlying canonical fields, one has the possibility 
of writing a theory entirely in terms of self-adjoint 
currents as the dynamical variables, bypassing the 
field operators entirely, and expressing a Hamilto- 
nian operator directly in terms of such local 
currents. This is in the spirit of approaches to 
quantum field theory based on local algebras of 
observables. It suggests consideration of relativistic 
current algebras with finite c-number or operator 
Schwinger terms in s + 1 dimensions, s > 1. 

The Sugawara model, which is of this type, turned 
out to be one of the most influential of those 
proposed in the late 1960s and early 1970s. 
Henceforth, let G be a compact Lie group, and G 
its Lie algebra; let F;, 4 — 1,..., dim G, be a basis for 
G, with [Fa, F] 2 iX4c;,54F;. The Sugawara current 
algebra, at the fixed time x? — y? (which, from here 
on, we suppress in the notation), is given by 


L9). J0(y)] = i89) (x y) Y capa]d (x) 
d 


[9 (x), JEO) = 16°) (x — y) 》 cana] (x) 3 
d 


路 ic z0 — y) 
[J(x),J5(y)) =0 (t= 1,2,3) 


where J^ = a k — 1,2,3, is again a 4-vector, c is a 
finite constant, and I is the identity operator. The time 
components in eqns [8] behave like the local currents in 
eqns [4]. The Schwinger term is a c-number, while 
setting the commutators of the space components to 
zero is the simplest choice consistent with the Jacobi 
identity. The Sugawara Hamiltonian is given in terms of 
the local currents by the formal expression: 


M ra 3 1 S 2 
H=£> [ox] Be +A?) p 


where the pointwise products of the currents require 
interpretation in the particular representation. This 
Hamiltonian leads to current conservation equations 


for the J^. 


Related to the Sugawara current algebra, with s — 1 
and the spatial dimension compactified, are affine 
Kac-Moody and Virasoro algebras (Goddard and 
Olive 1986, Kac 1990). Consider the infinite-dimen- 
sional Lie algebra map(S',G) of smooth functions 
from the circle to G under the pointwise bracket. This 
is also called a loop algebra. Referring to the basis F4, 
define T" for integer m to be the Fourier function 
0 一 F, exp|—im6]. The pointwise bracket in 
map(S',G) gives [T!", T = Bi for these 
generators. The corresponding (untwisted) affine 
Kac-Moody algebra is a (uniquely defined, nontri- 
vial) one-dimensional central extension of this loop 
algebra — that is, the new generator commutes with all 
elements of the Lie algebra and, in an irreducible 
representation, must be a multiple of the identity. 
In such a representation, the new bracket can be 
written as 


[T , T] =i X caba TY ™ + kmbapbm, nl [10] 
d 


where k is a constant. Here, T'"-)) is again a 
representation of G. Self-adjointness of the local 
currents in the representation imposes the condition 
Tw - qp 

Now the compactly supported C™ (tangent) 
vector fields on a C® manifold M form a natural 
Lie algebra under the Lie bracket, denoted by 
vecto( M). In local Euclidean coordinates, for g,,g; € 
vecto(M), one can write this bracket as 


[£1.22] = £1: V82 — 82: Vay [11] 


As the affine Kac-Moody algebras are central 
extensions of the algebra of G-valued functions on 
S', so are Virasoro algebras central extensions of the 
algebra of vector fields on S'. Let 工 denote 
the (complexified) vector field described by 
exp [—im0](1/i)0/00, tor integer m. These genera- 
tors then satisfy — [L?, T] 三 (za — n)L rtm, 
Adjoining to the Lie algebra of vector fields a 
new central element (commuting with all the 
L"") the Virasoro bracket in an irreducible 
representation is given by the formula 


L, L| - (m > n)L\"*”) 
(m+ 1)m(m — 1). 
i 12 


where the numerical coefficient c is called the 
Virasoro central charge; self-adjointness of the 
currents imposes LU" — [ (7), It is straightforward 
to verify that eqn [12] satisfies the Jacobi identity. 
The special form of the central term in the Virasoro 
current algebra results from the Gelfand—Fuks 
cohomology on the algebra of vector fields. 


+ TET 


The Kac-Moody and Virasoro algebras, both 
modeled on $!, may be combined to form a natural 
semidirect sum of Lie algebras, with the additional 
bracket 


[T99), L = mT) [13] 


Roughly speaking, the Kac-Moody generators cor- 
respond to Fourier transforms of charge densities on 
S', whereas the Virasoro generators correspond to 
Fourier transforms of infinitesimal motions in $!. 
The central extensions provide the finite, c-number 
Schwinger terms. These structures have important 
application to light cone current algebra, confor- 
mally invariant quantum field theories in (1 + 1)- 
dimensional spacetime, the quantum theory of 
strings, exactly solvable models in statistical 
mechanics, and many other domains. 

Of greatest physical importance, both in quantum 
field theory and statistical mechanics, are those 
irreducible, self-adjoint representations of the Virasoro 
algebra known as highest weight representations, 
where the spectrum of the operator L'"~=°) is bounded 
below. In these applications, one represents a pair of 
Virasoro algebras by mutually commuting sets of 
operators LW and L™. In the quantum theory, for 
example, one takes the total energy H x LO + LO, 
and the total momentum P x L® 一 工 (0. In a highest 
weight representation, there is a unique eigenstate of 
LO having the lowest eigenvalue h; for this “vacuum” 
|b), L'™ |b) — 0, m > 0. 

Friedan, Qiu, and Shenker showed in 1984 that 
highest weight representations are characterized by a 
class of specific, non-negative values of the central 
charge c and, correspondingly, of 5: either c > 1 (and 
b>0) or c-1—6(£-2) (t -3/ 1, €=1,2,3.,... 
(and h assumes a corresponding, specified set of values 
for each value of /). In a beautiful application to the 
study of the critical behavior of well-known statistical 
systems, in which the generator of dilations is 
proportional to L'° + L”, they discovered a direct 
correspondence with permitted values of the central 
charge; thus, c — 1/2 for the Ising model, c — 7/10 for 
the tricritical Ising model, c — 4/5 for the three-state 
Potts model, and c — 6/7 for the tricritical three-state 
Potts model. 


Current Algebras and Groups 


Local current algebras may be exponentiated to 
obtain corresponding infinite-dimensional topologi- 
cal groups (Pressley and Segal 1986, Mickelsson 
1989, Kac 1990). Let G be a Lie group whose Lie 
algebra is G. The algebra mapọ(M, G), consisting of 
smooth, compactly supported G-valued functions on 
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M under the pointwise bracket, exponentiates to the 
local current group Map (M,G), consisting of 
smooth maps from M to G that are the identity 
outside a compact set in M, under the pointwise 
group operation. When M is taken to be the four- 
dimensional spacetime manifold (rather than a 
spacelike hyperplane), the local current group 
modeled on M is mathematically a gauge group for 
nonabelian gauge field theory. 

Likewise, the algebra vecto(M) exponentiates to 
the group Diffo(M) of compactly supported C? 
diffeomorphisms of M (under composition). The 
Kac-Moody and Virasoro algebras exponentiate to 
central extensions of the loop group Map(S!, G) and 
the diffeomorphism group Diff(S'), respectively. The 
semidirect sums of the Lie algebras are the infinite- 
simal generators of semidirect products of the 
groups. 

Under appropriate technical conditions, self- 
adjoint representations of current algebras generate 
(and may be obtained from) continuous unitary 
representations of the corresponding groups. The 
needed technical conditions have to do with the 
existence of a dense set of analytic vectors belonging 
to a common, dense invariant domain of essential 
self-adjointness for the currents. 


Nonrelativistic Current Algebra 


In nonrelativistic local current algebra, Schwinger 
terms do not appear. In 1968, Dashen and Sharp 
defined (at fixed time £, suppressed in the present 
notation) a mass density p(x)—;nv'(x)v(x) and a 
momentum density J(x)— (5/2i)(v* (x) Vv(x) 一 
[V4»* (x)]b(x)), where w is a second-quantized cano- 
nical field; here we keep 5b in the notation. The 
resulting equal-time algebra is the semidirect sum: 


[p(x), p(y)] =0 
(x), 0] = —ih [6 (x yola) 


J*c)J't ibl S uox yy U^ 
By 


— gg 96 - A91 


Since this current algebra is independent of whether 
w obeys commutation or anticommutation relations, 
the information as to particle statistics (Bose or 
Fermi) is not encoded in the Lie algebra itself but in 
the choice of its representation (up to unitary 
equivalence). Again interpreting p and J^ as operator- 
valued distributions, define p(f)— fp; dx f (x)p(x) 
and J(g)= fp: dx X? ig'(x)(x) where f and the 
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components g* of the vector field g belong to the 
function-space D. Then the Lie algebra becomes 


[p (fi). e(f2)] = 0 
PC) J(g)] = ibp(g - Vf) [15] 
U(Gi).J(g2)] = —ibJ(Igi. g2]) 


Equations [15] are a representation by self-adjoint 
operators of the semidirect sum of the abelian Lie 
algebra D with vecto(R?). The corresponding group 
is the natural semidirect product of the space D 
(regarded as an abelian topological group under 
addition) with Diffo(R). 

The construction generalizes to a general manifold 
M or manifold with boundary (in place of R?), and 
to a general set of charge densities that generate the 
local Lie algebra map,(M,G). When M = S!, we have 
the Kac-Moody and Virasoro algebras with central 
charge zero. However, L' in the nonrelativistic 
(1 + 1)-dimensional quantum theories is propor- 
tional to the total momentum P, and thus is 
unbounded above and below. 

The continuous  unitary representations of 
Diffo(M), or its semidirect product with a local 
current group at fixed time, thus describe nonrela- 
tivistic quantum systems (Albeverio et al. 1999, 
Goldin 2004). The unitary representation V(ó), ó € 
Diffo(M), satisfies V(¢%)= exp [i(r/b)J(g)], where 
r € R and ó is the one-parameter flow in Diffo(M) 
generated by the vector field g. Such a representa- 
tion may be described very generally by means of a 
measure 人 on a configuration space A, quasi-invariant 
under a group action of Diffo(M) on A, together 
with a unitary 1-cocycle x on Diffo(M) x A. The 
Hilbert space for the representation is 
H=Li,(A, W), which is the space of measurable 
functions V(4),y € A, taking values in an inner 
product space W, and square integrable with respect 
to u. The unitary representation V is given by 


IVATO) = xs()w (9) $20) [16 


where by denotes the group action Diffo(M) x A 一 
A; jig is the measure on A transformed by ó (which, 
by the quasi-invariance of p, is absolutely contin- 
uous with respect to jj; dug/du is the Radon- 
Nikodym derivative; and x,(?): W — W is a system 
of unitary operators in W obeying the cocycle 
equation 


Xo1¢2 (T) = xoi (1) X (17) [17] 


Equations [16] and [17] hold outside sets of 
u-measure zero in A. Given the quasi-invariant 
measure J on A, one may always choose W =C 


and y4(7) = 1 to obtain a unitary group representa- 
tion on complex-valued wave functions; but inequi- 
valent cocycles describe  unitarily inequivalent 
representations. 

The configuration space A‘), N=1,2,3,..., 
consists of N-point subsets of R^, and j/ is the 
(local) Lebesgue measure on A"), The correspond- 
ing diffeomorphism group and local current algebra 
representations describe N identical quantum parti- 
cles in s-dimensional space. When x = 1, we have 
bosonic exchange symmetry. Inequivalent cocycles 
on AN are obtained (for s 2) by inducing 
(generalizing. Mackey's method) from inequivalent 
unitary representations of the fundamental group 
mi [AU]. For s > 3, this fundamental group is the 
symmetric group SN of particle permutations; the 
odd representation of SN,N > 2, gives fermionic 
exchange symmetry, while the higher-dimensional 
representations are associated with particles satisfy- 
ing the parastatistics of Greenberg and Messiah. 

When s — 2, however, 7z1[AV] is the braid group 
By. Goldin, Menikoff, and Sharp obtained induced 
representations of the current algebra describing the 
intermediate statistics proposed by Leinaas and 
Myrheim for identical particles in 2-space. Such 
excitations, subsequently termed “anyons” by Wilc- 
zek and characterized as charge-flux tube compo- 
sites, are important constructs in the theory of 
surface phenomena such as the quantum Hall effect, 
and anyonic statistics has also been applied to the 
study of high-T. superconductivity. Current algebra 
representations induced by  higher-dimensional 
representations of By describe the statistics of 
“plektons.” Similarly, current algebra in nonsimply 
connected space describes the Aharonov-Bohm 
effect. 

Let v'(b) = fy. d'x b(x)v'(x) denote the smeared 
creation field. Let the indexed set of representations 
pn, Jn, N —0,1,2,..., satisfying the current algebra 
[15], act in Hilbert spaces Hy, where v" (b): Hyn 一 
HN+1, Wh): An > HN,V(P)|Ho —0, so that v* 
and wv intertwine the N-particle diffeomorphism 
group representations. Let p(f) and J(g) act on 
DN_oHN, so that p(f)Uw— pN(f)Vu,J(g) Uu = 
J]x(g)Vx. Then conditions for a Fock space hier- 
archy are specified by commutator brackets of the 
fields with the currents: 


[o(f), v* (b)] = v'(ou-a(f)h) 
U(g), v" (b)] = v*' (JN-1(g)h) 


The local creation and annihilation fields for anyons 
in R?, obeying [18], satisfy q-commutation relations, 
where q is the relative phase change associated with 
a single counterclockwise exchange of two anyons, 


[18] 
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and the q-commutator [A,B], — AB — qBA. These 
relations generalize the canonical commutation 
(q— 1) and anticommutation (q— 一 1) relations of 
quantum field theory. 

When A is the configuration space of infinite but 
locally finite subsets of R^, nonrelativistic current 
algebra describes the physics of infinite gases in 
continuum classical or quantum statistical 
mechanics. Here, the most important kinds of 
measures u are Poisson measures (associated with 
gases of noninteracting particles at fixed average 
density) or Gibbsian measures (associated with 
translation-invariant two-body interactions). These 
measures describe equilibrium states and correlation 
functions in the classical case, and specify the 
current algebra representations in the quantum 
theory. 

The group of volume-preserving diffeomorphisms 
was taken by Arnold as the symmetry group of an 
ideal, classical, incompressible fluid, and Marsden 
and Weinstein described the hydrodynamics of such 
a fluid using the Lie-Poisson bracket associated with 
the nonrelativistic current algebra of divergenceless 
vector fields. The idea of using this algebra to study 
quantized fluid motion, included in the program 
proposed by Rasetti and Regge, formed the basis of 
the subsequent study of quantized vortex structures 
in superfluids from the point of view of geometric 
quantization on coadjoint orbits of the diffeomorph- 
ism group. This leads to quantum configuration 
spaces whose elements are no longer sets of points — 
for example, spaces of vortex filaments in R?, or 
ribbons and tubes in R°. 


See also: Algebraic Approach to Quantum Field Theory; 
Electroweak Theory; Quantum Chromodynamics; 
Solitons and Kac-Moody Lie Algebras; Symmetries in 
Quantum Field Theory: Algebraic Aspects; Toda Lattices; 
Two-Dimensional Conformal Field Theory and Vertex 
Operator Algebras. 
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界 著名 数学 家 .哲学 家 、 逻 辑 学 家 弗 雷 格 曾 给 出 了 一 个 著名 等 式 : 半 个 数学 
家 十 半 个 哲学 家 一 好 的 哲学 家 十 好 的 数学 家 . 他 解释 说 :一 个 好 的 数学 家 ,至 
少 是 半 个 哲学 家 ;一 个 好 的 哲学 家 ,至 少 是 半 个 数学 家 .” 
本 书 的 目的 就 是 要 用 物理 学 家 替换 上 述 等 式 中 的 哲学 家 . 
举 两 个 刚刚 读 到 的 例子 ,从 中 可 见 物理 学 家 对 数学 也 会 有 贡献 . 物理 学 家 李 政 道 和 杨振宁 在 研 
究 统 计 力学 的 一 个 问题 时 , 遇 到 了 一 类 特殊 的 多 项 式 
P(z) = Maz 
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的 集合 X 他 们 能 够 分 析出 ,多 中 的 任意 一 个 多 项 式 忆 的 所 有 根 都 位 于 复 平 面 的 单位 圆周 {zx: lel = 
1} 上 .因此 他 们 猜测 这 个 结论 对 多 中 的 所 有 多 项 式 P B LOL. 如 果 他 们 可 以 找到 一 个 西 矩 阵 U 使 得 
P(z) 是 U 的 特征 多 项 式 , 即 PO) —detCd —U) ,那么 猜想 就 证 明了 .这 是 任何 一 个 学 过 高 等 数学 的 
人 都 会 想到 的 办 法 ,但 这 个 方法 在 此 不 管用 , 杨 和 李 有 很 好 的 数学 功底 ,因此 找到 一 个 证 明 , 但 这 个 
证 明 并 不 简单 . 现在 有 更 容易 的 证 明了 ,这 要 特别 归功 于 浅野 太郎 (Taro Asano), 为 证 明 杨 一 李 单 位 
图 定理 (将 在 下 面 陈述 ), 我 们 需要 将 单 变量 < 的 mn 次 多 项 式 己 替换 为 1i 个 变量 zlv…vzw 的 多 项 式 
Hz se zn) Q lz r ,zm) 关 于 每 个 变量 =z; 都 是 一 次 的 . 我 们 感 兴 趣 的 是 这 样 一 类 多 项 式 
Q(z, 2,08 € QR Elza [<l |n <1 HE Qm o7. 2,2750. 因此 ,如 果 PC 2 — 
Q(z,*…,z) 且 在 Q@ 中 , 则 PP 的 根 & MARES (在 我 们 感 兴趣 的 情况 下 ,存在 一 个 对 称 zz ' ,因此 
也 有 | I rA | El —1.0R Jl & d QG im FQ ants m4 QT LU 
Q(z,.*.z, )OCG aa ALT ME 
也 在 Q 中 . 我 们 现在 描述 一 个 不 那么 显然 的 运算 , 称 之 为 浅野 缩 并 , 它 将 Q 中 的 多 项 式 变 为 Q 中 的 
多 项 式 . id 
Q(z .*.z,)—4Àz;z, Bz;,-Cz, +D 
其 中 A,B,C,D 是 变量 zi,…*,z。 中 除去 zj; ,zi 之 外 的 其 余 罗 一 2 个 变量 的 多 项 式 , 浅野 缩 并 将 两 个 
变量 z; ,zx 蔡 换 为 一 个 单独 的 变量 zj ,使 得 
Azze t Bz; t Cz +t D=Az +D 
从 一 个 贡 元 多 项 式 Q 出 发 ,经 过 一 次 浅野 缩 并 ,我 们 得 到 一 个 m 一 1 元 多 项 式 , 如 果 原 来 的 多 项 式 
在 QQ@ 中 , 则 所 得 的 新 的 多 项 式 也 在 QQ 中 . (这 是 一 个 简单 的 练习 :Azi 十 D 的 根 是 Az* 十 (B 十 C)z 十 
D 的 两 根 之 积 的 相反 数 . ) 可 以 验证 ,如 果 一 1 委 wr 委 1, 则 两 个 变量 x) ,zk 的 形 如 
zZ, ay Cx; -z,) 1 
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的 多 项 式 也 在 Q F. ( 令 多 项 式 等 于 零 , 则 得 到 一 个 映射 四 一 zk 该 映射 是 一 个 对 合 , 并 且 将 单位 圆 的 内 部 映射 到 单位 圆 的 
外 部 . ) 将 这 些 多 项 式 相继 相 乘 , 当 同 一 个 变量 出 现 两 次 时 做 一 次 浅野 缩 并 ,最 后 令 所 有 的 变量 都 等 于 z, 则 我 们 得 到 杨 一 
李 单 位 圆 定 理 : 对 于 实数 dj “ays ~ lap Sl, 多 项 式 


P= P», æ [[ Ia; C) 
XC C 7. m) /EX k& X 
的 所 有 根 都 位 于 单位 圆周 上 号 . 

再 比如 物理 学 家 张 宗 燃 . 张 宗 燃 步 入 量子 场 论 研究 领域 ,主要 受到 玻 尔 (N, Bohr) 的 影响 , 从 两 人 的 通信 中 ,可 以 看 出 
张 宗 煤 对 理论 研究 的 偏好 . 而 在 理论 研究 中 , 张 宗 焰 又 有 明显 的 数学 倾向 . 其 研究 特点 为 :数学 技巧 强 , 善 于 应 用 数学 解析 
物理 理论 问题 . 在 物理 研究 中 ,他 主张 多 做 群 论 和 对 称 性 的 工作 . 其 研究 成 果 中 数学 计算 和 表达 都 相当 “清楚 、 干 胞 、 可 靠 ”， 
结论 简明 准确 . 在 《数学 译 林 》 为 田 方 增 先生 百 岁 诞辰 的 贺信 中 就 提 到 : 泛 函 分 析 学 科 在 中 国 科学 院 数学 研究 所 几乎 一 开始 
就 是 基础 理论 与 应 用 并 重地 发 展 , 按 科学 规划 的 精神 ,从 1958 年 起 数学 所 泛 函 分 析 学 科 强 调 其 发 展 要 侧重 于 与 微分 方程 、 
物理 学 、 高 尖 科 技 和 国民 经 济 建 设 之 联系 .为 此 , 田 方 增 、 关 秘 直 常 与 吴 新 谋 、 张 宗 燃 等 合作 ,使 数学 所 内 泛 函 分 析 的 发 展 始 
终 注 意 与 微分 方程 及 现代 数学 物理 的 联系 ,先后 组 织 了 量子 场 理论 .粒子 迁移 理论 和 电磁 波 理论 中 数学 问题 之 研究 等 学 术 
讨论 班 .他 撰写 的 学 术 论 文 为 发 展 中 国 在 这 一 领域 的 数学 研究 做 出 了 重要 贡献 . 田 方 增 与 关 秘 直 一 起 成 功 地 在 中 国 开辟 了 
应 用 泛 函 分 析 的 一 个 重要 领域 粒子 迁移 理论 的 数学 基础 及 问题 的 研究 ， 

所 以 说 数学 和 物理 互 易 性 强 . 一 些 数学 家 后 来 成 了 物理 学 家 (例如 戴 森 (Freeman Dyson)) ,而 另 一 些 人 正好 相反 (例如 
钱 德 拉 (Harish Chandra) , W FF (Roul Bott)), 他 们 从 物理 学 家 变 成 了 数学 家 . 最 夸张 的 莫 过 于 威 腾 (Edward Witten, 
1951— 2.1990 年 获得 菲 尔 兹 奖 的 理论 物理 学 家 威 腾 于 1976 年 在 普林斯顿 大 学 在 诺 贝 尔 奖 得 主 (2004) 格 罗斯 (David 
Gross) 的 指导 下 获得 物理 学 博士 学 位 ;但 他 从 未 获得 过 数学 博士 学 位 . 

那么 学 习 物 理 到 底 应 该 掌握 多 少数 学 呢 ? 

一 位 致力 于 学 习 理 论 物 理 的 学 生 曾 请 教 赫 柏 林 院 士 怎样 治学 . 赫 先 生 说 :“ 要 想 搞 理论 物理 ,首先 数学 要 好 . 前 两 年 先 
把 斯 米尔 诺 夫 的 五 卷 及 变 分 学 .微分 几何、 数理 方法 、 拓 扑 和 积分 等 学 完 , 然 后 开始 进入 近代 数学 ,要 学 流 形 、 群 .连续 群 、 李 
群 、 现 代 微 分 几何 等 .” 

当然 这 只 是 入 门 级 的 数学 . 

本 套 从 书 摇 似 物理 实则 充斥 着 现代 数学 ,正如 中 国 科 学 院 理论 物理 研究 所 吴 岳 良 研 究 员 所 评介 的 那样 

本 书 物理 学 部 分 与 数学 部 分 的 关系 很 难 分 开 . 实际 上 ,经 典 力 学 ,电磁 学 .统计 力学 .量子 力学 、 流 体力 学 ,可 积 系 统 和 和 
动力 系统 中 的 许多 物理 问题 可 归结 为 求解 数学 上 的 常 微分 方程 .篇 微分 方程 .积分 方程 .微分 积分 方程 等 数学 物理 方程 , 物 
理学 问题 的 解 会 涉及 复 变 函数 和 特殊 函数 等 多 种 函数 ,在 求解 时 又 会 用 到 变 分 技术 .调和 分 析 、 泛 函 分 析 等 各 种 数学 分 析 
Jr xk. 同时 ,对 爱 因 斯 坦 狭义 相对 论 和 广义 相对 论 , 它 不 仅 改变 了 人 们 的 时 室 观 ,还 使 得 闵可夫 斯 基 时 空 的 几何 学 和 黎 曼 空 
间 的 几何 学 成 为 物理 理论 的 数学 基础 ,同时 也 使 得 向 量 分 析 . 张 量 分 析 和 微分 儿 和 何等 成 为 必要 的 数学 分 析 工 具 .在 量子 力 
学 中 ,物理 量 成 为 算 子 ,物理 状态 用 波 函 数 来 描述 , 算 子 的 谱 才 是 测量 到 的 物理 量 .在 量子 场 论 中 , 波 函 数 又 被 二 次 量子 化 
成 为 算 子 用 来 描述 林 本 粒子 在 相互 作用 过 程 中 的 产生 和 漂 灭 .这 使 得 算 子 代数 、 量 子 化 方法 和 路 径 积 分 等 数学 理论 和 方法 
成 为 量子 物理 的 数学 基础 . 粒子 物理 学 家 发 现 自 然 界 的 3 种 基本 作用 力 : 电 磁 相 互 作 用 、 弱 相互 作用 和 强 相 互 作 用 可 用 规 
范 理论 来 描述 ,并 完全 由 规范 对 称 性 来 支配 ,这 些 对 称 性 在 数学 上 用 李 群 和 李 代数 来 描写 ,事实 上 ,晶体 的 结构 也 是 由 欧 几 
里 得 空间 中 的 转动 群 来 描述 ,这 使 得 群 论 在 物理 学 中 的 应 用 ,尤其 在 粒子 物理 中 的 应 用 变 得 越 来 越 重要 . 在 规范 理论 中 , 规 
范 势 当 作 基 本 的 量子 场 , 而 它 被 发 现 就 是 数学 家 在 现代 微分 几何 学 中 所 研究 的 纤维 从 上 的 联络 ,这 使 得 有 关 纤 维 从 的 拓扑 
不 变量 在 粒子 物理 和 量子 场 论 研究 中 变 得 重要 起 来 ,如 规范 场 的 磁 单 极 子 和 有 瞬 子 解 及 手 征 量子 反常 等 .在 量子 引力 和 超 弦 
理论 的 研究 中 ,不 仅 运 用 到 已 有 的 数学 理论 和 方法 , 尤其 是 现代 数学 , 还 促进 了 数学 理论 本 身 的 发 展 . 同样 ,在 凝聚 态 物 质 
和 和 光学 方面 ,物质 的 拓扑 相 和 拓扑 缺陷 、 拓 扑 量子 计算 等 也 应 用 到 了 许多 现代 数学 方法 ,这 使 得 代数 拓扑 、 代 数 方 法 、 量 子 
群 , 复 几何 、 辛 几何 与 拓扑 、 低 维 几何 、 非 交换 几何 等 数学 理论 和 数学 方法 越 来 越 多 地 渗透 到 理论 物理 的 研究 中 . 男 外 ,在 研 
究 微 观 物理 对 象 的 随机 性 和 名 种 随机 过 程 的 统计 规律 ,无 序 系统 和 动力 系统 时 ,随机 方法 和 离散 数学 等 也 得 到 越 来 越 广 汉 
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的 应 用 . 

数学 对 物理 的 影响 有 多 大 ? 

正如 本 书 前 言 中 所 写 : 

当然 ,数学 是 确实 存在 的 .事实 上 ,从 某 种 角度 而 言 ,物理 学 是 由 精确 的 数学 逻辑 所 操控 的 : 古 希 腊 人 把 空间 几何 结构 
变 成 了 一 种 真实 的 艺术 形式 . 就 我 所 知 , 古 希腊 人 是 “数学 物理 ”的 第 一 个 践 行者 ,他 们 引入 了 坐标 轴 的 概念 ,从 而 把 空间 几 
何 的 所 有 量 都 转化 为 一 些 简单 的 数字 .今天 ,这 些 被 称 作 “ 物 理学 的 基本 定律 ”, 直到 很 久 以 后 我 们 才 认 识 到 如 下 事实 .时间 
流 可 以 类 似 地 被 坐标 化 . 它 连 同 空间 一 起 ,同样 可 用 几何 方法 来 解决 .于 是 ,有 一 些 疯 狂 的 人 对 数字 的 魔力 很 感 兴趣 ,但 是 ， 
我 们 的 现实 世界 似乎 确实 包含 许多 超出 我 们 分 析 能 力 的 地 方 . 

渐渐 地 ,所 有 这 一 切 都 变 了 .月 亮 和 其 他 行星 的 运动 好 像 都 满足 几何 定律 , tfr fuU AR Lax as ze Y 
逻辑 的 定律 ,并 注意 到 质量 的 概念 也 适用 于 大 空中 的 物体 ,就 像 地 球 上 的 苹果 和 大 炮 一 样 ,这 使 得 太空 更 容易 被 我 们 所 理 
AE. 同时 大 们 发 现 , 电 子 .磁场 . 光 和 声音 也 完全 按照 数学 方程 在 运转 . 

科学 家 认为 :开展 对 “数学 物理 ”的 深入 研究 ,有 助 于 揭示 出 物理 学 与 数学 之 间 的 内 在 联系 ,事实 上 ,从 自然 哲学 发 展 到 
物理 学 ,除了 使 用 实验 手段 和 新 的 思维 方法 ,数学 起 了 不 可 替代 的 作用 . 当 人 们 通过 分 析 大 量 实验 数据 和 吸取 各 种 唯 象 理 
论 的 精髓 ,以 严格 的 数学 语言 和 简洁 的 数学 公式 描述 支配 物质 基本 结构 和 宇宙 演化 的 物理 规律 时 ,物理 学 的 简洁 美 、 统 一 
美 、 对称 与 不 对 称 美 则 通过 深刻 的 数学 美 反 映 出 来 . 可 以 说 ,自从 物理 学 成 为 自然 科学 的 一 门 独立 学 科 后 ,物理 学 与 数学 之 
间 的 关系 变 得 密 不 可 分 ,古代 的 许多 科学 家 既是 数学 家 也 是 物理 学 家 ,尤其 到 了 近代 和 现代 ,许多 理论 物理 学 家 对 数学 的 
运用 和 发展 起 到 了 更 为 积极 的 推进 作用 ,数学 家 和 理论 物理 学 家 之 间 的 合作 也 变 得 越 来 越 频 繁 . 越 来 越 深入 ,他 们 成 为 了 
“数学 物理 ”的 践 行者 .大 家 最 为 熟知 的 古 希 腊 的 阿 基 米 德 , 他 既是 著名 的 数学 家 也 是 著名 的 物理 学 家 ,他 很 早 就 利用 数学 
这 个 工具 证 明了 杠杆 原理 和 浮力 原理 ,并 做 了 大 量 的 实验 .牛顿 在 研究 物体 和 天 体 的 运动 规律 时 发 展 出 新 的 数学 方法 一 一 
微 积分 . 爱 因 斯 坦 则 运用 对 当时 的 物理 学 家 来 说 全 新 的 数学 方法 一 一 微分 几何 和 歼 曼 几何 ,创立 了 广义 相对 论 . 爱 因 斯 坦 
曾 回 忆 说 ;“1912 年 我 突然 认识 到 ,高 斯 的 曲面 理论 是 解 开 这 个 秘密 的 钥匙 ,他 的 曲面 坐标 系 意义 重大 .不 过 ,当时 我 还 不 
知道 黎 曼 已 经 更 深入 地 研究 了 几何 基础 .我 突然 想起 , 读 大 学 时 盖 泽 先生 给 我 们 上 的 几何 就 包括 高 斯 理论 -…” 我 认识 到 几 
何 基础 具有 物理 学 意义 . 当 我 从 布拉格 回 到 苏黎世 时 ,我 亲爱 的 朋友 、 数 学 家 格 罗斯 曼 也 在 苏黎世 . 他 告诉 了 我 高 斯 ,然后 
是 黎 曼 . 格 罗 斯 曼 两 助 插 刀 ,直接 催生 广义 相对 论 .” 

伟大 的 几何 学 家 海 曼 。 和 格拉 斯 曼 由 在 1844 年 发 表 的 《Lineale Ausdehnungslehre) (( 3E 45 38 ie )). 这 本 书 像 麦 比 乌 斯 的 
那 本 名 著 一 样 具 有 丰富 的 思想 ,但 与 麦 的 写作 风格 不 同 , 非 常 星 梁 , 以 至 几 十 年 未 被 人 注意 ,也 没有 被 人 读 懂 ,只 是 在 其 他 
书 和 文章 中 出 现 了 一 系列 类 似 的 思想 之 后 , 才 认 识 到 这 些 思想 出 自 格 拉 斯 曼 的 书 , 不 过 为 时 已 晚 . 如果 你 想 领 略 一 下 这 种 
抽象 的 笔法 ;你 只 要 看 一 下 这 本 书 里 的 某 几 章 的 标题 ,如 :" 纯 数学 之 概念 之 导出 “了 延 拓 理 论 之 推导 ”“ 延 拓 理 论 之 叙述 ”“ 表 
示 之 形式 “一 般 形式 理论 之 概述 .你 只 有 费劲 地 钻 通 了 这 些 内 容 之 后 才 接 触 到 所 述 内 容 的 纯 抽 象 的 表示 ,不 过 仍然 很 难 
ih. 直到 1862 年 该 书 出 版 了 后 期 的 修订 本 包 , 格 拉 斯 曼 才 用 了 一 种 比较 容易 接受 的 表示 法 , 即 坐 标 表示 法 . 此 外 ,格拉 斯 
曼 选 了 一 个 词 Ausdehnungslehre( 延 拓 论 ), 用 以 暗示 他 的 研究 可 应 用 于 任意 维 空间 ,而 几何 学 对 他 而 言 只 不 过 是 这 个 
完全 抽象 的 新 学 科 在 普通 三 维 空间 中 的 应 用 .但 是 他 造 的 这 个 新 词 并 没有 生根 ,人 们 现今 简称 为 *n 维 几 何 学 ”. 

我 们 普通 读者 可 能 易 将 数学 物理 与 数学 物理 方程 相 混 淆 ,其 实 这 是 两 个 内 涵 和 外 延 都 不 同 的 概念 ,后 者 只 能 视 为 前 者 
的 一 个 真子 集 ,而 前 者 不 论 从 内 容 上 还 是 所 涵盖 的 范围 都 远 远 超过 了 后 者 ,但 有 一 点 共同 之 处 是 它们 的 问题 都 源 自 于 物 
理 , 但 解决 都 来 自 于 数学 家 . 比如 迪 利 克 雷 猜想 的 解决 ,“ 迪 利克 雷 原理 ”这 一 数学 猜想 自 提出 之 日 起 ,历经 了 三 十 多 年 的 激 
烈 论 争 和 反复 ,最 终 才 被 确立 ,这 是 迪 利 克 雷 在 研究 微分 方程 位 势 原理 时 提出 的 一 个 猜想 ,其 具体 内 容 简单 地 说 大 体 是 : 极 


小 化 迪 利 克 雷 积分 
Tte) GI Jeo 


的 函数 ,满足 位 势 方程 


(D 海 曼 ，。 格 拉 斯 曼 ,《 延 拓 理 论 》 出 版 于 1844 年 莱比锡 . 并 可 参阅 其 Gesammelte mathematische und physikalische Werke. 第 1 卷 ， 
莱比锡 ,1894 年 ,第 二 版 出 版 于 1898 年 莱比锡 . 
© ”柏林 ,1862 年 . 见 其 著作 集 第 1 卷 第 二 部 分 ,莱比锡 ,1896 年 . 
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后 来 有 人 在 研究 三 维 位 势 方 程 ( 亦 称 拉 普 拉 斯 方程 或 调和 方程 ) 


fu, Pu, dy 


Ets" 


时 ,又 提出 ,由 位 势 方程 所 描述 的 相应 物理 状态 总 有 一 个 确定 的 物理 解 ,因而 其 本 身 也 必然 存在 一 个 数学 解 ,但 在 数学 上 的 
这 种 存在 性 ,长 时 间 的 不 能 被 证 明 , 直 到 1851 年 , 黎 曼 才 在 他 的 博士 论文 “ 单 复 变 函数 一 般 理 论 的 基础 "中 ,给 出 了 位 势 方 
程 边界 问题 解 的 存在 性 证 明 . 由 于 黎 曼 在 文中 运用 了 他 的 老师 迪 利 克 雷 所 提出 的 上 述 猜想 , 故 他 称 之 为 “ 迪 利 克 雷 原理 ”. 
可 是 ,在 其 论文 发 表 后 的 不 长 时 间 , 这 个 原理 便 激 起 了 热烈 的 讨论 ,特别 是 黎 曼 的 这 一 证 明 受 到 了 德国 著名 数学 家 魏 尔 其 
Ar tt Bt CK. W. Weierstrass,1815 一 1897) 的 尖锐 批评 ,他 指出 : 黎 曼 不 加 证 明 就 先 验 地 假定 一 定 会 存在 一 个 使 积分 取得 到 极 
小 值 的 函数 ,这 在 数学 上 是 不 允许 的 ,尽管 受到 了 大 师 的 批评 , 黎 曼 并 没有 因此 动摇 自己 对 过 利克 雷 原理 的 信心 ,并 且 一 鼓 
作 气 又 运用 此 原理 作出 了 一 系列 重要 的 发 现 .1866 年 , 黎 曼 英 年 早 逝 ,但 关于 迪 利 克 雷 原理 是 否 成 立 的 争论 仍 未 停止 . 
1870 年 , 魏 尔 斯 特 拉 斯 给 出 了 一 个 与 迪 利 克 雷 原理 相反 的 例子 ,在 这 个 例子 中 ,对 给 定 的 边界 条 件 , 使 过 利克 雷 积分 达到 
极 小 值 的 函数 是 不 存在 的 ,并 以 此 来 否定 迪 利 克 雷 原理 . 由 于 迪 利 克 雷 原理 被 当时 的 数学 权威 狐 尔 斯 特 拉 斯 所 否定 ,所 以 
数学 家 们 只 好 另辟蹊径 来 证 明 位 势 方程 边界 问题 解 的 存在 性 ,比较 著名 的 有 三 种 证 法 ,1870 年 纽曼 用 * 算 术 平 均值 法 "给 
出 了 一 个 证 明 ;1890 年 , 许 瓦 兹 用 “交替 法 ”又 给 出 了 一 个 证 明 , 同 年 , 庞 加 莱 用 “ 扫 散 法 "也 给 出 了 一 个 证 明 . 这 些 证 明 从 惧 
辑 上 讲 无 疑 都 是 对 的 ,但 就 是 没有 一 个 能 够 像 以 迪 利 克 雷 原理 为 工具 那样 简单 、 明 快 ,这 又 不 禁 使 得 数学 家 们 怀念 起 “过 利 
克 雷 原理 ”来 ,都 对 它 当年 被 否定 而 感到 忱 惜 ,并 随 之 产生 了 复活 这 一 原理 的 念头 ,并 且 也 为 之 做 出 了 一 些 努 力 ,只 可 惜 都 
未 能 成 功 ,数学 界 为 此 弥漫 着 一 种 悲观 的 气氛 ,数学 家 纽曼 就 表示 :如 此 优美 而 又 有 如 此 广阔 应 用 前 景 的 迪 利 克 雷 原理 ,已 
经 从 我 们 的 视线 中 “永远 消失 ” 掉 了 1! 

俗话 说 “三 十 年 河东 ,三 十 年 河西 ”, 就 在 迪 利 克 雷 原理 被 否定 三 十 年 之 后 , 即 1899 年 ,德国 领袖 数学 家 希 尔 伯 特 对 此 
又 发 动 了 一 场 新 的 “救亡 运动 ”. 他 彻底 冲破 了 那 种 把 严格 性 与 简单 性 对 立 起 来 的 传统 观念 ,批判 了 魏 尔 斯 特 拉 斯 以 严格 性 
全 枪 否定 迪 利 克 雷 原理 的 做 法 ,从 人 迪 利 克 雷 原理 的 简单 性 .优美 性 以 及 应 用 的 有 效 性 出 发 ,积极 寻求 它 的 真实 性 和 合理 性 ， 
最 后 终于 找到 了 证 明 迪 利克 雷 原理 的 途径 和 方法 .他 在 德国 数学 联合 会 上 报告 了 他 的 这 一 研究 成 果 , 并 明确 指出 :只 要 对 
间 题 中 的 区 域 .边界 值 和 允许 函数 的 性 质 作 适当 的 限制 ,就 完全 可 以 恢复 迪 利 克 雷 原理 的 真实 性 . 他 还 针对 数学 家 们 认为 
迪 利 克 雷 原理 早已 沉没 了 的 观点 ,意味 深长 地 将 他 的 这 一 研究 工作 称 为 * 迪 利克 雷 原理 的 复活 ”. 后 来 希 尔 伯 特 又 给 出 一 个 
更 为 一 般 的 证 明 , 从 而 进一步 肯定 了 过 利克 雷 原理 存在 的 合理 性 . 

及 至 近代 更 多 源 自 于 物理 的 数学 理论 被 抽象 出 来 ,而 对 这 些 数学 理论 的 进一步 研究 又 极 大 地 推动 了 物理 学 的 进展 ,如 
Yang-Mills 规范 场 的 大 范围 整体 性 质 和 手 征 量 子 反常 与 纤维 丛 的 拓扑 不 变量 和 Chern-Simons 示 性 类 及 指标 定理 之 间 建 
立 起 直接 的 联系 , 超 芒 理论 中 的 额外 维 空间 与 Calabi-Yan 空间 之 间 的 对 应 关系 ,理论 物理 学 家 威 腾 在 发 展 超 弦 理论 的 同时 
由 于 对 数学 的 杰出 贡献 而 获得 菲 尔 兹 奖 ,这 些 都 是 物理 学 与 数学 相互 结合 所 呈现 在 “数学 物理 ”方面 的 经 典 例子 . 

对 此 我 国 数学 工作 者 旱 有 清醒 的 认识 ,20 世纪 80 年 代 李 大 港 就 撰文 指出 ,学 数学 的 追求 纯 而 又 纯 的 境界 ,即使 从 纯 
数学 的 发 展 来 说 ,也 不 见得 是 一 条 康 庄 大 道 . 不 重视 实际 的 需要 和 其 他 领域 的 发 展 ,没有 广阔 的 视野 ,是 很 难 出 第 一 流 的 基 
础 理论 人 才 的 . 

基础 和 应 用 有 着 密切 的 关系 ,而 且 相 互 促进 . 搞 基础 理论 的 人 重视 应 用 方面 的 教育 和 训练 ,对 基础 理论 和 应 用 的 研究 
会 带 来 很 大 的 促进 . 物理 学 中 的 规范 场 和 数学 上 的 纤维 从 概念 有 密切 的 联系 . 据 杨 振 宁 教授 自己 讲 , 他 在 美国 请 教 了 很 多 
纤维 丛 方面 的 数学 家 ,但 他 们 讲 的 一 套 , 他 听 不 懂 , 双 方 始终 谈 不 到 一 起 去 .只 有 到 了 复旦 大 学 , 听 谷 超 豪 教授 用 物理 学 家 
可 以 接受 的 语言 ,把 这 二 者 的 关系 讲 得 很 清楚 ,杨振宁 教授 很 高 兴 , 并 和 谷 超 豪 教授 合作 .在 规范 场 的 数学 理论 方面 做 出 很 
多 成 绩 ,把 这 方面 的 理论 进一步 发 展 了 ,为 什么 能 这 样 呢 ? 谷 超 豪 教授 在 念 大 学 时 ,就 选修 了 物理 系 四 大 力学 的 课程 , 作为 
一 个 数学 家 ,他 不 仅 在 数学 上 有 很 高 的 造 畜 ,而 且 在 物理 学 方面 也 有 很 好 的 修养 . 

从 本 书 的 目录 我 们 可 以 看 出 它 包 含 了 相当 全 面 的 数学 内 容 . 它们 分 别 是 :数学 物理 学 导言 ,经典 力学 、 流 体 动力 学 、 可 
积 系 统 、 经 典 场 论 、 共 形 与 拓扑 场 论 、 量 子 场 论 . 广 义 相 对 论 、. 量 子 引 力 、 蓄 论 与 M- 理 论 , 凝 聚 态 物质 与 光学 、 量 子 信息 与 量 
子 计算 、 量 子 力学 无 序 系统 ,动力 系统 .平衡 态 统计 力学 和 非 平衡 态 统计 力学 .代数 技巧 . 李 群 和 李 代数 、 离 散 数学 、 量 子 
群 . 随 机 方法 、 复 几何 、 微 分 几何 、 低 维 几何 、 非 交换 几何 、 代 数 拓扑 、 辛 几何 与 拓扑 、 常 微分 和 偏 微分 方程 , 泛 函 分 析 和 算 子 
代数 .量子 化 方法 和 路 径 积 分 、 变 分 技术 . 

本 书 的 三 位 主编 在 序言 中 写 道 :“ 数 学 物理 把 数学 和 物理 学 这 两 大 学 科 的 优势 集中 到 一 起 ,它们 的 关系 是 共同 发 展 . 一 
方面 , 它 运用 数学 这 一 工具 把 不 断 增长 的 精确 性 和 复杂 性 这 些 物理 概念 组 织 了 起 来 ; 男 一 方面 ,物理 学 家 为 数学 家 提供 了 
灵感 的 源泉 .” 同 时 ,也 正如 诺 贝 尔 物理 学 奖 获得 者 荷兰 Utrecht X¥ Gerard’ Hooft 教授 在 前 言 中 指出 :“ 物 理 世 界 与 数学 
世界 之 间 存 在 明显 的 重要 区 别 . 物理 世界 强调 事实 的 “真相 ' ,无 论 ' 真 相 ' 是 什么 .而 数学 是 纯 讽 辑 和 纯 推理 的 世界 . 在 物理 


编辑 手记 685 


学 中 ,一 个 理论 是 否 能 被 接受 是 由 实验 来 最 后 决定 的 . 物理 学 中 的 方法 论 也 与 数学 不 同 .” 

一 个 广大 读者 所 关注 的 例子 是 天 体 物理 学 家 霍金 是 否 完美 地 解决 了 黑洞 火 墙 悖 论 ? 起 码 现在 还 没有 定论 ,只 能 算是 
给 出 了 第 三 种 可 能 的 解释 而 已 .尽管 人 们 对 于 黑洞 的 具体 性 质 还 没有 全 部 了 解 ,但 是 它 作 为 一 种 致密 天 体 的 存在 早已 没有 
争议 ,而 黑洞 火 墙 悖 论 的 中 心 ,仍然 在 于 量子 力学 与 广义 相对 论 的 矛盾 .量子 力学 把 黑 润 的 视界 定义 为 一 个 神秘 的 、 拥 有 巨 
大 能 量 的 火 墙 ,广义 相对 论 则 拒绝 承认 在 宇 害 中 存在 这 种 神奇 的 火 墙 ,认为 黑洞 视界 只 是 一 种 数学 上 的 存在 而 已 . 因此 ,要 
想 真 正解 决 黑洞 火 墙 悖 论 , 人 类 需要 对 自然 界 有 更 深刻 的 理解 . 者 金 自 己 也 承认 ,要 想 真 正 理 解 物质 和 信息 最 终 从 黑洞 中 
逃脱 的 原理 ,最 终 需 要 人 们 把 引力 和 自然 界 的 其 他 作用 力 合 而 为 一 ,这 是 一 个 困扰 了 物理 学 家 们 将 近 一 个 世纪 的 难题 ,至 
今 仍然 没有 得 到 解决 . 作为 人 类 现代 文明 的 两 块 基石 ,广义 相对 论 通 过 优美 的 数学 形式 描述 宇 害 ,目前 人 们 认为 对 它 已 经 
有 足够 深刻 的 理解 ,而 量子 力学 则 通过 一 种 概率 化 的 形式 描述 微观 世界 , 它 的 内 涵 和 基本 规律 仍然 不 为 人 知 ,就 连 量子 力 
学 的 创立 者 尼 尔 斯 。 玻 尔 也 说 “没有 人 理解 量子 力学 ”. 黑洞 火 墙 悖 论 是 这 两 种 理论 在 宇宙 深 处 的 交锋 ,而 交锋 的 结果 , 目 
前 仍然 无 法 预料 . 

本 书 在 刚 引 进 中 国 时 曾 有 过 一 个 12 卷 精 装 本 . 以 内 容 划 分 是 一 种 创新 ,这 种 事 出 版 界 常 有 . 

中 央 文 献 研究 室 所 编 《 毛 泽 东 年 谱 (1949 一 1976)》C 中 央 文 献 出 版 社 ,2014) 皇 皇 6 卷 , 是 读者 期 待 已 久 的 一 部 大 书 .不 
贤 者 识 其 小 ,这 里 只 摘抄 一 点 儿 关 于 图 书 装订 的 内 容 .1965 年 8 月 14 日 ,毛泽东 就 印 一 批 马 列 经 典 大 字 本 问题 指示 周扬 : 
“同意 用 照相 放大 胶印 的 办 法 .但 请 注意 封面 不 用 硬 纸 ; 大 书 ( 例 如 《唯物 主义 与 经 验 批 判 主义 兴 反 杜 林 论 )) 过 去 例 作 一 卷 
或 两 卷 , 现 应 分 车 4 卷 或 8 卷 , 使 每 卷 重量 减轻 .” 印 大 字 本 ,是 因为 老 同志 视力 差 ; 封 面 不 用 硬 纸 ,就 是 不 要 硬 精 装 , 因 其 不 
方便 单 手 握 卷 , 秧 着 阅读 ; 较 厚 的 书 应 该 多 分 几 册 (其 实 毛 泽 东 推举 的 两 本 书 都 在 500 页 以 下 ). 总 体 而 言 ,毛泽东 对 大 字 本 
的 这 些 要 求 ,都 是 以 读者 为 本 位 ,以 方便 阅读 为 目的 的 .有 人 说 :当今 出 版 界 在 装订 方面 ,流行 大 开本 \ 大 厚 本 .无 线 胶 订 , 以 
傻 ,大 、 黑 、 粗 为 尚 , 这 种 专门 为 难 读者 的 精神 ,实在 令 人 费解 ， 

但 笔者 认为 本 书 绝对 算得 上 是 数学 物理 中 的 经 典 之 作 . 而 向 经 典 致敬 的 方式 各 有 不 同 ,最 传统 .最 有 效 的 就 是 保持 原 
HRA. 原来 我 们 准备 连 封面 都 拷贝 原版 ,后 与 版 权 代 理 协 商 才 改 成 现在 的 样子 .真正 美好 的 东西 都 一 定 是 增 一 分 则 多 mM 
一 分 则 少 , 原 来 就 刚刚 好 ,我 们 为 什么 要 破坏 它 呢 ?难道 我 们 真 的 有 自信 会 使 其 变 得 更 好 吗 , 佛 头 著 烘 与 狗 尾 续 狠 都 会 让 
读者 吐槽 的 . 

还 有 一 个 原因 使 我 们 一 定 要 保持 原 摇 , 那 就 是 翻译 的 巨大 工作 量 , 我 们 哈尔滨 工业 大 学 出 版 社 地 处 北方 ,远离 经 济 与 
文化 中 心 , 实 在 是 没有 能 力 组 织 宕 大 的 翻译 队伍 , 耗 巨 资 多 年 打磨 这 套 从 书 . 我们 待 将 来 实力 增强 后 再 购买 中 文 版 权 来 完 
成 这 一 宿 愿 .在 购买 版 权时 我 们 也 表达 了 购买 数字 版 权 的 意向 ,但 被 婉拒 了 ,因为 英文 版 的 数字 出 版 外 方 已 做 得 很 完善 了 ， 
不 像 我 们 刚 起 步 , 而 且 在 碎片 化 之 后 还 面临 着 版 权 保 护 问题 ,在 辞典 出 版 中 这 是 个 顽疾 . SPAT: 

认 不 认得 这 个 英文 单词 esquivalience? FAG? 那 你 可 以 去 查 一 下 新 版 的 《新 牛津 美语 词 虹 》(《New Oxford American 
Dictionary)? ,里 面 会 告诉 你 这 个 词 的 意思 是 ,故意 逃避 自己 的 官方 责任 .19 世纪 开始 出 现 , 或 许 是 源 自 法 文 esquiver, * $% 
避 , 汐 走 ””, 

不 过 如 果 你 拿 起 家 中 案头 的 其 他 词典 ,或 者 将 词 输入 到 各 种 电子 词典 中 ,保证 你 怎么 查 都 查 不 到 这 个 词 ,要 是 你 查 到 
了 * 那 可 就 有 事 了 . 

为 什么 会 这 样 ? 因为 这 个 词根 本 就 是 《新 牛津 美语 词典 》 编 辑 部 发 明 的 ,不 存在 的 词 . 什么 ? 词典 里 竟然 有 虚构 的 词 ? 
编 词典 的 人 怎么 可 以 干 这 种 事 ? 

词典 里 有 虚构 的 词 ,不 只 k 新 牛津 美语 词典 》, 基 本 上 每 一 本 词典 里 都 茂 有 这 种 凭空 创造 的 词 , 放 这 样 的 词 在 词典 里 ,个 
不 是 出 于 编辑 的 恶作剧 坏 心 ,而 是 有 有 具体 用 处 的 . 

这 是 保护 著作 权 的 重要 机 关 . 辛 辛苦 苦 编 出 一 本 厚重 的 词典 ,要 如 何 防止 别人 贪 便 宜 , 把 你 的 词典 拿 去 剪 剪 贴 贴 , 改 头 
换 面 就 变 出 他 们 的 词典 呢 ? 词 是 共通 的 , 词 的 意思 解释 也 不 会 有 多 大 的 差别 ,要 怎样 证 明 别 人 的 词典 抄袭 、 盗 取 你 的 内 容 ? 

要 是 esquivalience 这 个 词 出 现在 《新 牛津 美语 词典 ) 以 外 的 词典 里 ,就 一 定 牵涉 到 抄 效 、 盗 取 , 这 个 词 就 是 为 了 找 出 抄 
袭 、 盗 取 而 放 在 那里 埋伏 的 . 

当前 全 球 出 版 业 都 不 景气 ,特别 是 在 纸 书 出 版 领域 . PR PRM AE RRR. 尽管 各 路 专家 给 出 了 不 同 的 原因 分 
析 . 但 只 有 一 位 专家 给 出 的 答案 令 业 内 所 信服 , 那 就 是 优质 内 容 的 缺失 .说 到 底 出 版 是 一 个 内 容 为 王 的 产业 ,没有 好 的 内 
容 , 一 切 都 是 无 本 之 源 . 

有 位 作家 说 :平庸 是 这 个 时 代 的 危险 所 在 , 它 无 法 再 吸收 传统 知识 ;现代 生活 杂乱 无 章 , 令 人 漂 没 无 闻 . 一切 都 掉 在 浅 
水 中 ,没有 什么 沉 入 深 深 的 井中 :一 切 都 是 飞 短 流 长 ,一 切 都 是 流言 赣 语 . . 

我 们 应 该 敢于 承认 一 个 基本 事实 ,这 个 事实 便 是 一 一 在 这 个 平庸 的 时 代 , 最 坏 的 都 活 下 来 了 ,最 好 的 死去 了 ,我 们 这 些 
还 能 逃生 的 ,发 挥 不 出 真正 的 价值 .那么 ,在 这 个 平庸 的 时 代 , 我 们 还 能 做 什么 呢 ? 
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由 更 感谢 爱 思 唯 尔 (Elsevier) 公 司 于 2006 年 6 月 出 版 的 这 套 《Encyclopedia of Mathematical Physics) C( & 5E 49 3€ X 8 
科 全 书 》) ,这 是 一 部 不 平凡 的 全 面 介 绍 数学 物理 知识 的 百科 全 书 ， 

Aw = fe = He GER ERE EX Jean-Pierre Francoise 教授 .美国 费城 德 雷 塞 尔 大 学 Gregory L. Naber Ri fe % 
国 牛 津 大 学 Tsou Sheung Tsun 博士 ) 都 是 长 期 从 事 数 学 物理 方面 研究 的 知名 学 者 .他们 邀请 了 包括 诺 贝 尔 物理 学 奖 获 得 
者 杨振宁 教授 和 英国 牛津 大 学 Roger Penrose 教授 在 内 的 34 位 著名 物理 学 家 和 数学 家 .作为 本 书 的 编辑 顾问 委员 会 成 员 ， 
组 织 来 自 30 个 国家 的 439 位 在 物理 学 和 数学 相关 研究 领域 做 出 杰出 贡献 的 理论 物理 学 家 和 数学 家 ,撰写 了 400 多 篇 图 文 
并 茂 的 综述 性 文章 ， 

《数学 物理 大 百科 人 全书) 是 经 长 达 4 年 完成 的 一 部 内 容 全 面 系统 ,领域 涵盖 广泛 的 百科 人 全书, 全书 特色 鲜明 , 既 体 现 了 
学 科 的 基础 性 、 独 立 性 、 完 整 性 ,又 注重 学 科 的 前 沿 性 ,交叉 性 、 应 用 性 ,是 当今 数学 物理 研究 领域 最 新 和 最 全 的 百科 全 书 . 

本 书 内 容 涉 及 物理 学 和 数学 的 几乎 各 个 重要 研究 领域 ,遍及 从 经 典 力学 到 量子 力学 .经 典 场 论 到 量子 场 论 、 共 形 场 论 
到 拓扑 场 论 .流体 动力 学 到 动力 系统 、 可 积 系统 到 无 序 系 统 . 粒 子 物 理 到 天 体 宇宙 学 、 相对论 到 量子 引力 .规范 理论 到 统一 
理论 .平衡 态 统计 到 非 平衡 态 统计 、 凝 聚 态 物质 到 量子 信息 、 变 分 技术 到 代数 方法 、 泛 涵 分 析 到 算 子 代数 、 路 径 积 分 到 随机 
方法 、 李 群 到 量子 群 .微分 几何 到 代数 拓扑 、 低 维 几何 到 非 交 换 几何 、 复 几何 到 辛 儿 何 等 核心 领域 和 方向 . 本 书 还 特别 注重 
数学 物理 的 最 新 研究 成 果 和 在 各 领域 的 最 新 应 用 ,并 提供 了 大 量 必要 的 和 重要 的 参考 文献 . 

本 书 相 比 一 般 的 百科 全 书 有 一 个 明显 的 亮点 是 它 的 综述 . 它 可 以 告诉 你 你 想 知 道 的 某 个 专题 的 一 切 . 中 国 科 学 院 院士 
赫 柏 林 曾 留学 于 哈 尔 科 夫 大 学 , 据 他 回忆 当时 的 考试 是 由 数学 物理 教授 A. Ya. Povzner 主持 . 他 出 的 题目 是 “把 从 你 生 下 
来 以 后 所 知道 的 贝 塞 尔 函 数 的 一 切 都 告诉 我 .” 据 他 的 学 生 说 :他 写 了 一 大 探 纸 , 密 密 麻 麻 ,然后 告诉 Povzner 这 是 我 知道 
的 关于 贝 塞 尔 函 数 知 识 的 提纲 .若是 需要 ,我 可 以 展开 每 一 项 的 具体 内 容 . "于 是 考试 通过 ， 

正如 Gerard't Hooft 所 指出 的 那样 : 

数学 物理 这 个 交叉 学 科 是 非常 难 懂 的 . 百科 全 书 中 的 某 些 题目 纯粹 是 物理 的 .高 工 超 导电 性 破坏 水 波 和 磁 水 动力 是 
完全 物理 的 题目 ,其 中 的 实验 数据 比 任何 高 深 理论 都 具有 决定 性 .然而 ,上 同调 理论 .Donaldson-Witten 理论 和 AdS/CFT 
对 应 是 纯 数 学 的 例子 . 

在 编辑 中 ,大 量 不 同 作者 的 短小 文章 不 可 避免 地 被 做 了 适当 的 变动 .在 这 本 百科 全 书 中 ,理论 物理 学 家 和 数学 家 为 高 
等 数学 物理 中 的 许多 重要 条 目 做 了 简单 明了 的 阅 述 .所 有 的 文章 都 包含 了 供 进一步 阅读 的 参考 文献 .我 们 盼望 这 些 努 力 会 
取得 很 好 的 效果 . 

本 书 的 编者 认为 : 

与 狭义 的 数学 和 物理 学 的 古老 历史 相 比 ,数学 物理 是 一 门 相对 较 新 的 独立 学 科 . 数学 物理 国际 协会 成 立 于 1976 E 
然 ,从 古 时 候 起 数学 与 物理 学 就 相互 影响 ;但 近 几 十 年 来 ,可 能 因为 我 们 正身 在 其 中 ,它们 出 现 了 巨大 的 进展 ,新 的 结果 和 
观点 以 令 人 目眩 的 节奏 诞生 ,以 至 于 需要 有 一 本 百科 全 书 来 搜集 整理 这 些 知识 . 

数学 物理 把 数学 和 物理 学 这 两 个 大 学 科 的 优势 集中 到 一 起 ,它们 的 关系 是 共同 发 展 . 一 方面 , 它 运 用 数学 这 一 工具 把 
不 断 增 长 的 精确 性 和 复杂 性 这 些 物 理 概 念 组 织 了 起 来 ; 另 一 方面 ,物理 学 家 为 数学 家 们 提供 了 灵感 的 源泉 .两 者 关系 的 经 
典 例子 是 爱 因 斯 坦 的 相对 论 , 其 中 微分 儿 何 在 物理 理论 的 公式 化 方面 起 到 了 实质 性 的 作用 ,而 物理 学 相继 提出 的 问题 推动 
了 微分 几何 的 发 展 .巧合 的 是 , 当 我 们 在 为 《数学 物理 大 百科 全 书 》 写 序言 时 , 正 值 爱 因 斯 坦 创 造 奇 迹 100 周年 . 

再 三 考虑 到 写 这 部 《数学 物理 大 百科 全 书 》 是 一 个 艰巨 的 项 目 . 如果 不 是 坚信 这 是 一 项 很 有 意义 的 .受益 于 社会 的 项 
目 ,而 且 我 们 会 得 到 众多 的 支持 ,那么 我 们 绝 不 会 接受 这 个 任务 . 我 们 确实 获得 了 许多 支持 ,包括 建议 .鼓励 和 有 实用 性 的 
帮助 ,这 些 支 持 来 自 编 辑 顾 问 委员 会 成 员 和 我 们 的 作者 ,还 有 其 他 恢 慨 地 抽 时 间 帮 我 们 完善 这 本 百科 全 书 的 人 . 

数学 物理 是 一 门 较 新 的 学 科 , 它 还 设 有 被 清晰 地 刻画 ,不 同 的 人 对 它 有 不 同 的 理解 ,在 我 们 选择 的 题目 中 ,一 部 分 遵循 
了 近期 数学 物理 国际 大 会 的 纲要 ,但 主要 参照 编辑 顾问 委员 会 和 作者 的 提议 . 由 于 时 间 和 空间 的 限制 ,以 及 我 们 自身 的 水 
平 所 限 , 更 改 了 某 些 宛 长 的 题目 ,但 我 们 尽量 收录 了 我 们 认为 是 核心 的 课题 ,尽量 材 盖 更 多 的 最 活跃 的 领域 . 

近年 在 中 国 对 本 书 的 原 出 版 商 还 是 有 些 负面 新 闻 的 ,起 源 是 在 美国 一 个 名 为 “知识 的 代价 ”网 站 上 ,已 有 全 球 12 196 位 
科学 家 签名 抵制 这 家 世界 上 最 大 的 出 版 商 , 有 人 用 “学 术 之 春 ” 形 容 这 场 运 动 . 

al 提 摩 西 。 高 尔 斯 (William Timothy Gowers). 这 位 来 自 剑 桥 大 学 的 菲 尔 兹 
奖 得 主 曾 发 表 了 一 篇 博客 文章 ,号 召 同 行 行动 起 来 ,抵制 世界 上 最 大 的 出 版 商 爱 思 唯 尔 集团 . 

读 到 这 篇 博文 的 泰勒 。 内 伦 (Tyler Neylon) 一 位 目前 在 和 硅谷 开 公 司 的 数学 博士 当即 给 高 尔 斯 教授 留 了 言 . 第 二 
天 ,他 建立 了 一 个 网 站 ,命名 为 "知识 的 代价 ”. 

秦 勒 事后 回忆 ,自己 读 到 那 篇 博文 ,就 意识 到 可 以 做 点 什么 .在 他 看 来 ,高 尔 斯 是 一 位 拥有 号 召 力 的 “超级 明星 ”. 

迄今 为 止 , 数 万 名 科学 家 在 泰勒 的 网 站 上 签 了 名 .他们 发 产 , 不 在 爱 思 唯 尔 旗下 的 期 刊 发 论文 ,不 做 审 稿 人 ,或 者 不 担 
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尽管 如 此 ,我 们 还 是 选择 了 与 爱 思 唯 尔 的 合作 ,因为 一 套 好 的 大 百科 太 难 得 了 . 

旅 法 钢琴 大 师 白 建 宇 (Kun-Woo Paik) 对 钢琴 的 要 求 非常 苛刻 ,他 在 一 次 与 台湾 出 版 人 部 明 义 先生 的 谈话 时 说 ,弹琴 弹 
到 现在 ,职业 演奏 生涯 超过 半 个 世纪 ,所 遇 到 满意 的 琴 竟 不 超过 5 架 , 如 此 答案 , 令 见 多 识 广 的 郝 先生 也 大 上 吃 一 惊 ， 

在 数理 方面 ,近年 来 国内 引进 的 好 的 大 百科 也 绝 不 会 超过 5 部 ,前 苏联 五 卷 本 的 《数学 大 百科 全 书 》 算 一 部 ,日 本 震波 
的 《数学 百科 全 书 ) 算 一 部 ,总 之 是 届 指 可 数 . 

其 实 这 个 项 目 并 不 是 爱 思 唯 尔 创始 的 , 据 介 绍 , 这 个 项 目 开 始 于 Academic Press, 后 来 由 爱 思 唯 尔 接手 ; 他 们 热情 的 工 
FAR ,把 过 渡 工 作 做 得 天 衣 无 镍 . 并 且 令 人 感动 的 是 ,相当 一 部 分 作者 慷慨 地 把 他 们 的 酬劳 捐赠 给 欧洲 数学 会 的 发 展 中 
国家 委员 会 ,我 们 应 该 感谢 他 们 为 发 展 中 国家 所 做 的 一 切 。 

至 于 我 们 最 关心 的 问题 : 谁 会 去 购买 这 样 一 套 大 书 , 我 们 充满 乐观 . 大 千 世 界 无 奇 不 有 ,各 种 购买 方式 都 可 能 出 现 . 前 
一 阵子 ,有 关 堆 人 金 打 赌 输 掉 关于 “上 帝 粒 子 ? 存 在 性 的 财 约 报道 很 多 ， 

实验 证 明 霍 金 输 掉 了 这 场 财 约 ,霍金 坦承 自己 输 得 心服 口服 并 祝愿 希 格 斯 获得 诺 贝 尔 奖 . 希 格 斯 透露 ,在 宣布 发 现 新 
粒子 后 ,霍金 曾 与 他 联系 并 表示 支票 已 寄 出 . 希 格 斯 说 性 他 不 仅 是 给 我 一 个 人 钱 .我 想 他 还 会 家 100 美元 给 密 歌 根 大 学 的 
X758." 

这 场 赌 约 的 另 一 位 赢家 凯 因 对 来 自 霍金 的 美元 欣然 接受 “我 坚信 希 格 斯 玻 色 子 一 定 会 被 找到 . 发现 希 格 斯 玻 色 子 真 
EKET. 它 证 实 了 长 久 以 来 的 猜想 ,进一步 加 强 了 粒子 物理 “标准 模型 "的 事实 根据 . 打赌 获胜 是 锦上添花 ." 凯 思 表 示 要 
把 赢 来 的 钱 花 在 刀 丸 上 ,所 有 的 钱 都 要 用 于 搞 研 究 . 

霍金 可 能 已 经 习惯 了 以 输 掉 赌 约 的 方式 推进 科学 的 普及 . 

1975 年 ,霍金 曾 关 于 天 蝎 座 X—1 是 否 包 含 黑洞 打赌 ,后 来 认输 ,为 赢家 订阅 了 1 年 的 ¢ 阁 楼) 杂志 . 

1991 年 ,霍金 又 与 人 赌 上 了 .这 次 赌 的 是 裸 奇 点 是 否 存 在 ,霍金 再 次 输 了 . 

第 三 次 打赌 发 生 在 1997 年 ,霍金 同 美国 物理 学 家 约翰 。 普 雷 斯 基 尔 打赌 ,认为 黑洞 部 不 会 挫 毁 它们 符 噬 的 一 切 信息 ， 
起 金 于 2004 年 7 月 21 日 当众 表示 输 掉 了 这 场 赌 约 , 并 送 给 普 雷 斯 基 尔 一 套 板 球 百科 全 书 ， 

关于 希 格 斯 玻 色 粒子 的 赌 约 则 是 他 的 第 四 场 赌 约 ,这 30 多 年 来 ,霍金 通过 杂志 、 书 籍 和 一 点 点 美元 ,让 更 多 的 人 了 解 
到 这 些 科 学 最 前 沿 的 问题 .在 100 美元 的 赌 约 背后 , 希 格 斯 的 远见 和 霍金 的 牺牲 精神 都 值得 称道 . 

我 们 期 待 下 一 个 赌 约会 以 这 样 一 套 百科 全 书 来 结束 . 

著名 力学 家 周 培 源 90 岁 生 日 时 ,北京 大 学 全 体 师 生 用 “献身 科学 ,教育 英才 ; 功 在 国家 ,造福 将 来 ; 寿 齐 尚 岱 , 德 被 春 
蒋 ; 祝 成 欢呼 , 注 软 盛 哉 ?的 贺词 于 扬 他 们 的 老 校 长 . 斗 胆 借用 一 下 ,庆祝 这 套 书 在 中 国 的 出 版 , 当 不 为 过 . 
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SAREMA) 2015—01 
从 庞 加 莱 到 佩 雷 尔 曼 一 一 庞 加 莱 猜 想 的 历史 | 2013—10 | 29800 | m | 
从 切 比 雪夫 到 爱 尔 特 希 (上 ) 一 一 素数 定理 的 初等 证 明 | 2013—07 | — 48.00 | m | 
从 切 比 雪夫 到 爱 尔 特 希 ( 下 ) 一 一 素数 定理 100 年 | 2612—12 | 98.00 . Te | 
从 高 斯 到 盖 尔 方 特 一 一 二 次 域 的 高 斯 猜想 
从 库 默 尔 到 朗 兰 效 一 一 朗 兰 效 猜 想 的 历史 
从 布尔 到 豪 斯 道夫 一 一 布尔 方程 与 格 论 漫谈 | 2013-10 | 198.00 | W | 
从 开 普 勒 到 阿诺德 一 一 三 体 问题 的 历史 
从 华 林 到 华罗庚 一 一 华 林 问题 的 历史 


ELEGIA DGROQdU Y DE 119 
吴 振 检 高 等 数学 解 题 真 经 ( 微 积分 着 ) 150 


2012—01 151 


吴 振 硅 高 等 数学 解 题 真 经 (线性 代数 着 ) 
钱 虽 本 教 你 快乐 学 数学 (上 ) 155 


钱 昌 本 教 你 快乐 学 数学 (下 ) 
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高 等 数学 解 题 全 攻略 (下 卷 ) 2013—06 
高 等 数学 复习 纲要 2014 一 01 


BE 一 vv | AM | | 8S0 
解 = 角 形 828 
解 题 通 法 (三 ) 
BD 830 
物理 奥林匹克 竞赛 大 题 典 一 一 力学 郑 405 

VIS BUMUL yo Fe SEK LR RFE 
物理 奥林匹克 竞赛 大 题 典 一 一 光学 与 近代 物理 卷 


FAP AREER ME Se 20012012) 
历 忆 中国 两 部 地 区 数学 奥林匹克 试题 集 (2001 一 2012) 
DIJSEEI CITGORG p YCPEPUM 


WEE |__| 356 


XHESESERLTWOBDIEQGRX) 
美国 高 中 数学 竞赛 五 十 讲 . 第 2 EEX) 2014 一 08 


美国 高 中 数学 竞赛 五 十 讲 . 第 3 卷 (英文 ) 2014—09 
美国 高 中 数学 竞赛 五 十 讲 .第 4 卷 ( 英 文 ) 
美国 高 中 数学 竞赛 五 十 讲 . 第 5 卷 (英文 ) 2014 一 10 


美国 高 中 数学 竞赛 五 十 讲 .第 6 卷 (英文 ) 
美国 高 中 数学 竞赛 五 十 讲 .第 8 卷 (英文 ) 
365 

美国 高 中 数学 竞赛 五 十 讲 . 第 10 卷 (英文 ) 
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IMO 50%. 44%(1974—1978) | 即将 贱 | | _ | 
IMO 50 年 .第 7 #(1990—1994) Wua o  [-388 
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IMO 50 年 .第 9 Æ (2000— 2004) 385 
IMO 50 年 .第 10 3&(2005 — 2008) 即将 出 版 | | 386 


历届 美国 大 学 生 数 学 竞赛 试题 集 . 第 一 卷 (1938 一 1949) 


397 
历届 美国 大 学 生 数 学 竞赛 试题 集 . 第 二 卷 (1950 一 1959) 398 
历届 美国 大 学 生 数 学 竞赛 试题 集 . 第 三 卷 (1960 一 1969) 399 
历届 美国 大 学 生 数 学 竞赛 试题 集 . 第 五 卷 (1980 一 1989 ) 401 
: 403 
404 


历届 美国 大 学 生 数 学 竞赛 试题 集 . Bt (2000—2009) 
历届 美国 大 学 生 数 学 竞赛 试题 集 . 第 八 卷 (2010 一 2012 


新 课 标 高 考 数 学 创新 题解 题 雇 穿 :总 论 372 

新 课 标高 考 数学 创新 题解 题 诀 穹 :必修 1~5 分册 
新 课 标高 考 数学 创新 题解 是 诀窍 :选修 2 一 1,2 一 2,1 一 1,1 一 2 分 册 
新 课 标高 考 数学 创新 题解 题 诀 窍 选修 2 一 3,4 一 4,4 一 5 分 册 


全 国 重点 大 学 自主 招生 英文 数学 试题 全 攻略 :词汇 郑 
全 国 重点 大 学 自主 招生 英文 数学 试题 全 攻略 MAS 
全 国 重点 大 学 自主 招生 英文 数学 试题 全 攻略 :文章 选读 卷 (上 ) | 即将 出 版 | | 412 | 
全 国 重点 大 学 自主 招生 英文 数学 试题 全 攻略 :文章 选读 卷 (下 ) | 即将 出 版 | | 413 | 
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劳 埃 德 数学 趣 题 大 全 . 题目 卷 . 4; 英文 ”即将 出 版 | | 519 | 
[BREEXREXS MBSR «| 即将 由 版 | |x 
劳 埃 德 数学 趣 题 大 全 . 答案 卷 , 英文 ”即将 出 版 | | 521 | 
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了 与 和 分 全 的 这 和 和 人 人 
联系 地 址 :哈尔滨 市 南岗 区 复 华 四 道 街 10 号 ”哈尔滨 工业 大 学 出 版 社 刘 培 杰 数学 工作 室 
网 HE :http://lpj. hit. edu. cn/ 
邮 编 :150006 


联系 电话 :0451 一 86281378 13904613167 
E-mail:1pj1378(2163. com 
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